|
Donations |
|
|
|
|
|
If you found this site helpful, please donate to help keep it online
Don't want to use PayPal? Try our physical address
|
|
|
Survey |
|
|
|
|
|
|
|
|
Translate |
|
|
|
|
|
|
|
|
|
|
View previous topic :: View next topic |
Author |
Message |
LWC
Trooper
Joined: Feb 13, 2004
Posts: 27
Location: Israel
|
|
Back to top |
|
|
z12
Sergeant
Joined: Jul 17, 2002
Posts: 131
Location: USA
|
Posted: Tue Apr 13, 2004 8:48 am Post subject: |
|
|
Hi LWC,
Modify the matching expression like so:
Code: |
Match = "\0<i(mg|nput)(*alt=$AV(\2)|)*>\3"
"&*=\w://([^/]+{1,*}/&&(^\h)*)"
"&(^*(width=[#0:75]|height=[#0:20]))"
|
I don't understand the reasoning behind the height/width match, it seems a bit odd.
Also, matching for *src= and removing the anchor tag from the bounds check makes more sense to me. Depending on how your filters are arranged, Multi may not be needed either.
Of course, you can still get offsite images from js or background attributes.
HTH
Mike
|
|
Back to top |
|
|
LWC
Trooper
Joined: Feb 13, 2004
Posts: 27
Location: Israel
|
Posted: Tue Apr 13, 2004 1:29 pm Post subject: |
|
|
Hmm, the original filter has only two lines in "match".
I see you haven't touched the third one, changed
about half of the first one and added a new second line.
This filter has so many code in it that it looks likes chinese...can you
explain a little what you did? For example, why did you change
the "alt" part?
But most importantly, you've added a new parameter (number 3), but
haven't provided a new "replace" line so it's not even used, is it?
I think the only thing that needs to be changed is the first line:
Code: |
\1<i(mg|nput)(*alt="\0"|)*>\2&*http://(^\h)
|
and within it, only this part
the h (host) part is clever ("if it's the same host, it's ok"), but I want to
convince it to support empty URLs too (just "http://").
In other words, what it should be is:
But, unfortunately it doesn't work...
|
|
Back to top |
|
|
z12
Sergeant
Joined: Jul 17, 2002
Posts: 131
Location: USA
|
Posted: Tue Apr 13, 2004 2:47 pm Post subject: |
|
|
Hi LWC,
hmm, It seems that maybe we have different filters. I always backup the default config before I switch to mine, but it's possible that I've modifed it.
Here is the filter I was referring to:
Code: |
Name = "Kill off-site Images"
Active = FALSE
Multi = TRUE
Bounds = "<(a\s[^>]++href=*</a>|i(mg|nput)\s*>)"
Limit = 800
Match = "\1<i(mg|nput)(*alt="\0"|)*>\2&*http://(^\h)"
"&(^*(width=[#0-75]|height=[#0-20]))"
Replace = " \1<font size=1>[\0]</font>\2"
|
I see I should have included the new Replacement expression, so here's the whole thing:
Code: |
Name = "Kill off-site Images2"
Active = TRUE
Multi = TRUE
Bounds = "<(a\s[^>]++href=*</a>|i(mg|nput)\s*>)"
Limit = 800
Match = "\0<i(mg|nput)(*alt=$AV(\2)|)*>\3"
"&*=\w://([^/]+{1,*}/&&(^\h)*)"
"&(^*(width=[#0:75]|height=[#0:20]))"
Replace = " \0<font size=1>[\2]</font>\3"
|
Note: I added an empty line in this post above & below the matching expression for clarity.
The first tweak was to replace the alt match with $AV() to make sure the alt tag value was captured no matter what quotes were used.
The second tweak was to remove
from the first line. Apparently I changed the variables in the first line for no good reason.
The next tweak was to insert the new matching code for \h
Code: |
"&*=\w://([^/]+{1,*}/&&(^\h)*)"
This part:
&*=\w://
replaces the old
&*http://
|
I used \w cause it will match any quotes and protocol, not just http
(I would use &*src= to limit matching to image attributes)
This:
is used to capture the domain name up to and including the / character.
This won't match unless there is a domain name & path delimiter following the :// character sequence. (since we don't want to match "http://" without a domain name)
Finally we use "&&" to match the domain name captured above to the host name as follows:
Code: |
([^/]+{1,*}/&&(^\h)*)
|
For the matching the width & height values, I changed the format from [#0-75] to [#0:75] which is the newer way for doing numeric matches.
In the replacement text, I changed the variables to match the new code.
As far as using the replacement text, I probably wouldn't, but thats just me. You'll have to try it and see how you like it.
I also don't like the matching expression for width & height, I would probably delete it, or at least replace it. It really limits blocking of off-site images to fairly large one (probably necessary to fit in the replacement text).
HTH
Mike
|
|
Back to top |
|
|
LWC
Trooper
Joined: Feb 13, 2004
Posts: 27
Location: Israel
|
Posted: Tue Apr 13, 2004 6:18 pm Post subject: |
|
|
Well, I see now. Your version is sort of a compromize.
When it's not the same host, it still accepts it if there's no slash in it
(i.e. domain alone).
It's a lot better than before, but can't there be a solution that
doesn't even accept a domain?
Also, the Google page does work now, but Altavista's translation
page still doesn't work because the code there is:
Code: |
<input type="text" size="45" style="width:400" name="url" value="http://" />
|
See that useless slash at the end? They just had to go and put it...
it's not even correct HTML and because of that little slash, even
your version still rejects the entire code.
It'll be great if you could tell the filter to expect after the URL line
a space or a quote sign.
Thanks!
Last edited by LWC on Wed Apr 14, 2004 5:34 am, edited 1 time in total
|
|
Back to top |
|
|
Lepus
Trooper
Joined: Mar 02, 2004
Posts: 15
Location: USA
|
Posted: Tue Apr 13, 2004 7:13 pm Post subject: |
|
|
one quick'n'dirty fix might be to change...
to...
an inverted match using (^....) doesn't use up any of the text - it's just a true/false test at that spot. so adding [a-z0-9] after it will start at the first character of the hostname and make sure there's at least one valid letter or number after the "http://".
Quote: |
See that useless slash at the end? They just had to go and put it...
it's not even correct HTML and because of that little slash, even
your version still rejects the entire code.
|
Actually it's valid (and even required) for XML/XHTML.
|
|
Back to top |
|
|
z12
Sergeant
Joined: Jul 17, 2002
Posts: 131
Location: USA
|
Posted: Tue Apr 13, 2004 9:52 pm Post subject: |
|
|
Lepus, thanks for jumpin in!
Yeah, I was going the quick & dirty route with [^/]+{1,*}/
In most all of my replacement code, I use something like <proxo killed blah blah /> so I don't know where my head was at. I guess I assumed there would always be a path to an image.
I tried (^\h)[a-z0-9] on my filter but no joy. It sounds interesting, perhaps you can post something. The old \h never seems to work the way I think it would.
Anyway, heres a new filter to test drive:
Code: |
Name = "New offsite-image killer 1"
Active = TRUE
Bounds = "<i(mg|mage|nput)*>"
Limit = 1024
Match = "<(\w)\0*((*alt=$AV(\1))|)*"
"&*src=\w://(([^/"' ]+{1,*})\2&&(^\h)*)"
Replace = "<proxo killed="\0" with="\2" />"
|
I'm sure this isn't the final version!
For replacement text, there's plenty of options we can try:
1. Just Kill it.
2. Put the alt tag on the page.
3. Replace the image with a local ptron gif
4. Any or all of the above
5. Do something else instead.
Also, we can add image dimension checks.
Let me know how this works.
Mike
|
|
Back to top |
|
|
Lepus
Trooper
Joined: Mar 02, 2004
Posts: 15
Location: USA
|
Posted: Tue Apr 13, 2004 11:03 pm Post subject: |
|
|
z12 wrote: |
I tried (^\h)[a-z0-9] on my filter but no joy. It sounds interesting, perhaps you can post something.
|
How were you using it? I tested using a very simple match of...
Code: |
<tag * http://(^\h)[a-z0-9] * >
|
which seemed to work (at least in the tester window).
Code: |
<tag src="http://" > (no match)
<tag src="http://shonen.knife.com" > (no match)
<tag src="http://offsite.com" > (match)
|
The tester window's idea of the current URL seems to be "www.Shonen.Knife.com"
You might want to make sure some of your other logic isn't affecting this somehow.
|
|
Back to top |
|
|
LWC
Trooper
Joined: Feb 13, 2004
Posts: 27
Location: Israel
|
Posted: Wed Apr 14, 2004 5:42 am Post subject: |
|
|
Lepus, your reverse logic worked ("must meet a positive value" instead of
"must ignore a negative value")!
Your 8 letters fixed both Altavista's and Google's translation pages.
I wonder if it fixed any other "http://" only input tags pages, but I can't
think of any others right now. |
|
Back to top |
|
|
z12
Sergeant
Joined: Jul 17, 2002
Posts: 131
Location: USA
|
Posted: Wed Apr 14, 2004 7:09 am Post subject: |
|
|
Hi,
Simplicity, gotta love it. That works great Lepus.
Code: |
Name = "offsite-image replacer"
Active = TRUE
Bounds = "<i(mg|mage|nput)*>"
Limit = 1024
Match = "(\#( src=\w://(^\h)[a-z0-9]\w))+{1}\#"
Replace = "\# src="http://Local.ptron/clear.gif" \@"
|
Mike
|
|
Back to top |
|
|
|
|
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum
|
Powered by phpBB 2.0.8a © 2001 phpBB Group
Version 2.0.6 of PHP-Nuke Port by Tom Nitzschner © 2002 www.toms-home.com
Version 2.2 by Paul Laudanski © 2003-2004 Computer Cops
|