|
Donations |
|
|
|
|
|
If you found this site helpful, please donate to help keep it online
Don't want to use PayPal? Try our physical address
|
|
|
Survey |
|
|
|
|
|
|
|
|
Translate |
|
|
|
|
|
|
|
|
|
|
View previous topic :: View next topic |
Author |
Message |
datonn
Cadet
Joined: May 30, 2004
Posts: 7
Location: USA
|
Posted: Sat Jun 05, 2004 2:10 am Post subject: Filter Question: a RegEx search of blacklist URLs in body? |
|
|
Hello everyone.
I am a relative "newbie" to MWP 4.1, and I had an idea for a filter that I am hoping someone can help me with.
One of the things I have noticed in the spam I receive is that fact that, even though I might get 300-400 messages from different faked sender info, many of the messages are all trying to get me to visit the same URLs. I have religiously been harvesting URL info embedded in the body of spam messages that hits my inbox, and rather than simply having a "blacklist" that looks for that info in the "sender" field or message header, I am wondering if I can build a filter that would automatically delete any messages that have the domain name references in my blacklist appearing in the message's body.
MailWasher has cut my spam down by about 95 percent, thanks to a day or two spent playing around with RegEx and reading several people's filtering ideas in this forum! However, my remaining spam would be reduced to virtually ZERO if I could also filter out domain name matches from my blacklist appearing in my message body.
Any ideas?! Since spammers can fake Message IDs, Sender and Reply-to Info and originating IP addresses, my thought is to try and catch them by filtering out where they are trying to get me to "click"!
I would love to see if someone is already doing this, so I can add the filter to my anti-spam resources...... |
|
Back to top |
|
|
Eisenson
Corporal
Premium Member
Joined: May 22, 2004
Posts: 59
Location: USA
|
Posted: Sat Jun 05, 2004 9:03 am Post subject: |
|
|
I'm barely one click above newbie, my friend, but began doing exactly that last week. The spammers use an infinity of throwaway and spoofed email addresses, but it's not so easy to create new urls. Put that filter high on the priority list so you can watch it work - you'll be impressed. You can also harvest from emails that appear in the spam email...
It's work, but is satisfying and after a few days begins tapering off.
If others have the same good results, perhaps it makes sense to develop a community list of verboten websites - like a blacklist. Or is there already one out there that MWP can refer to, like the DNS blacklist? This is yet another case where shared information can multiply and accelerate effectiveness.
_________________
Perfection is sometimes sufficient... |
|
Back to top |
|
|
datonn
Cadet
Joined: May 30, 2004
Posts: 7
Location: USA
|
Posted: Sat Jun 05, 2004 9:57 am Post subject: |
|
|
Henry,
How did you do it? Could you share the filter criteria that you used? I'd like to get that filter in my MWP program right away if I can. Thanks! |
|
Back to top |
|
|
Eisenson
Corporal
Premium Member
Joined: May 22, 2004
Posts: 59
Location: USA
|
Posted: Sat Jun 05, 2004 10:44 am Post subject: |
|
|
Derek, I'm bored - down to 3-4 spams a day "leaking" past Mailwasher, and everything's automatically deleted. By next week I'll confidently stop looking at the filtered list. Most of my work for the past week has been fixing inadvertent deletions, and it's about perfect now. Here's how it works for me:
My installation includes Gary Partain's excellent filter list - adapted to suit my situation. I added Spamhaus as a DNS blacklist, and subscribed to FirstAlert. Very important: a comprehensive whitelist, including wildcards for corporate domains with which we deal a lot.
I used advice from others here to build a "not-from-me" filter -- if not henry([email protected]) then spam, in various forms.
My filter list grew like a cancer till I learned a bit about Regular Expressions, then I was able to get it under control. Ike and others helped with compressing punctuation tricks, such as:
[enabled],"Punctuation",filtered,16711680,AND,Blacklist,Delete,Automatic,Subject,
containsRE,(v.?i.?a.?g.?r.?a|c.?i.?a.?l.?i.?s
etc.
Then I began filtering on spammer domains. If the spam suggests a click on www.getrichquick.com, I just pick some identifying character sequence from the domain: ichqui from getrichquick, for example. By not getting too specific, this gets the one that triggered me and might get hits on terms like it whether a url or not - i.e., "webcam". The spammer can insert hidden html, etc., everywhere else, but not in their own url!
[enabled],URL,filtered,16711680,OR,Blacklist,Delete,Automatic,Body,
containsRE,ichqui|uts4U|webcam|onlinemeds
etc. etc.
You don't need to go to Notepad/Wordpad to enter new filter terms - just insert them in MWP's filter editor, using the | . |
|
Back to top |
|
|
datonn
Cadet
Joined: May 30, 2004
Posts: 7
Location: USA
|
Posted: Sat Jun 05, 2004 11:28 am Post subject: |
|
|
Henry,
Thanks. What I was really thinking of though was more of an "automated" process, where I could write a filter rule that would automatically search for the domains specified in my blacklist, rather than me having to enter them manually.
I have about 2,300+ domains that I have harvested over the past 6-7 months, and it will be a pain in the behind to have|to|enter|them|all|in|by|hand (not to mention the extra time it would take to have to add new ones twice...in my blacklist and my RegEx filter). It seems like there has got to be a way to make it a one step process, rather than adding the offending domains twice......something like:
[enabled],"Message Body Spam Links",filtered,<something>,AND,Delete,Automatic,Body,
containsRE,<any domains I have added to my Blacklist>
Obviously, that's not meant to be a working RegEx filter in its current state by any stretch of the imagination! However, if someone out there could help me/us fill in the <something> and <any domains that I have added to my blacklist>, I'm not sure if I'd see more than 2-3 spam messages per week leaking through MWP going forward!
Anyway, that's what I'm after. I know I can manually enter domains in RegEx expressions, but I am hoping to automate the process...so that MWP kills two birds with one stone (one addition of a domain into my blacklist).
Thanks! |
|
Back to top |
|
|
Eisenson
Corporal
Premium Member
Joined: May 22, 2004
Posts: 59
Location: USA
|
Posted: Sat Jun 05, 2004 11:51 am Post subject: |
|
|
But the 'blacklist' is just email addresses - and they're largely fake, including the domains. I don't think that putting them into the filter does any good unless the email address domain of the spammer happens to be "honest". Did I miss something?
The original thought was to capture domains from clickable urls embedded in the spam, because they're "honest", not munged, contain no buried html, appear in many different spams, etc.. *That* works for me, but it's work. Code that automatically finds such urls in spams and inserts them as filter criteria will earn a medal of honor. And I suggest that the gurus consider making it a community thing...
_________________
Perfection is sometimes sufficient... |
|
Back to top |
|
|
TonyKlein
Site Moderator
Joined: Oct 15, 2002
Posts: 5815
Location: Netherlands
|
Posted: Sat Jun 05, 2004 5:27 pm Post subject: |
|
|
I use this Spam Links filter, which is obviously a work in progress...
It's already catching a large amount of spam mails not detected any other way.
I have it set to Autodelete, but of course that's a personal thing.
I do monitor it for False Positives, but so far they're extremely rare.
Code: |
[enabled],"[1+] Spam links","Spam links",342206,OR,Hidden,Delete,Automatic,Body,containsRE,"free(hosting|member)|name.{1,10}johnson\.(net|com)|tom.com|paperparcel|rima-tde|(tabs|love|drug|med[zs]|medica|shape|click|suppl(y|ie)|pill[sz]|dating|product|health|busines|saving|save|profit|amor|kink|gun|herb|trade|adult|tits|vitali|shop|gadg[ie]t|lender|consume|purch|discount| remote|pharm|informat|porn|viagr|advert|miracl|toy[sz]|realty|adviz|anabol|comic|beaut|coupon|optin|reduce|bucks|viagr|cash|stock|loan|please|ultimat|xxx|teen|moms|milk|nurse|promo|topsite|bargain|commerc|coin|domain|market|stock|techie|rate|sale|huge|mortga|/bcute|publici|outstand|debt|value|heart|offer|cheap|ournames|style|incred|enhanc|enlarg|idea|price|rx|deal[zs]|doctor| youth|diet).{0,12}\.(biz|\bnet\b|info|org|\bcom\b)",Body,containsRE,"(soft|software)(inc|now|\.biz|4less|robot|4all|4U|foryou|forfree)|(top|best|cd|acc-).{0,7}software\.(\bnet\b|org|biz|info|\bcom\b)|(co|ne)\.jp|com\.cn/b|cn\.com",EntireHeader,containsRE,"free(hosting|member)|name.{1,10}johnson\.(net|com)|tom.com|paperparcel|rima-tde|(tabs|love|drug|med[zs]|medica|shape| click|suppl(y|ie)|pill[sz]|dating|product|health|busines|saving|save|profit|amor|kink|gun|herb|trade|adult|tits|vitali|shop|gadg[ie]t|lender|consume|purch|discount|remote|pharm|informat|porn|viagr|advert|miracl|toy[sz]|realty|adviz|anabol|comic|beaut|coupon|optin|reduce|bucks|viagr|cash|stock|loan|please|ultimat|xxx|teen|moms|milk|nurse|promo|topsite|bargain|commerc|coin|domain|market|stock| techie|rate|sale|huge|mortga|/bcute|publici|outstand|debt|value|heart|offer|cheap|ournames|style|incred|enhanc|enlarg|idea|price|rx|deal[zs]|doctor|youth|diet).{0,12}\.(biz|\bnet\b|info|org|\bcom\b)",EntireHeader,containsRE,"(soft|software)(inc|now|\.biz|4less|robot|4all|4U|foryou| forfree)|(top|best|cd|acc-).{0,7}software\.(\bnet\b|org|biz|info|\bcom\b)|(co|ne)\.jp|com\.cn/b|cn\.com",Body,containsRE,"(gopick|aclens|outblaze|pegpeg|prosize|pandawa|popstar|bboy|moscow|hongkong|brasil|tendencies|imporeaa|saintly|kwok|infinitum|china|africa|india|hongkong|thai|saigon|aqmp|co.kr|imoi|126|163|263|333|999|13542).{0,8}\.(\bnet\b|org|biz|info|\bcom\b)|(med|date|ads|adv|buy|cds|sex).{0,7}\.(\bnet\b|org|biz|info|\bcom\b)|(getit|perform|christia|financ|esoteric|fascina| gigant|magic|sensat|amaz|remarka|vacation|calcium|satisf[ai]|seduct|jesus|cathol|escort|tremend|sleaz|applicat|excellent|terrif|exciting|playboy|greatsoft).{1,20}\.(\bcom\b|info|\bnet\b|biz|org)",EntireHeader,containsRE,"(gopick|aclens|outblaze|pegpeg|prosize|pandawa|popstar| bboy|moscow|hongkong|brasil|tendencies|imporeaa|saintly|kwok|infinitum|china|africa|india|hongkong|thai|saigon|aqmp|co.kr|imoi|126|163|263|333|999|13542).{0,8}\.(\bnet\b|org|biz|info|\bcom\b)|(med|date|ads|adv|buy|cds|sex).{0,7}\.(\bnet\b|org|biz|info|\bcom\b)|(getit|perform|christia|financ|esoteric| fascina|gigant|magic|sensat|amaz|remarka|vacation|calcium|satisf[ai]|seduct|jesus|cathol|escort|tremend|sleaz|applicat|excellent|terrif|exciting|playboy|greatsoft).{1,20}\.(\bcom\b|info|\bnet\b|biz|org)"
|
_________________
Tony
Last edited by TonyKlein on Tue Jun 08, 2004 3:00 pm, edited 1 time in total
|
|
Back to top |
|
|
Eisenson
Corporal
Premium Member
Joined: May 22, 2004
Posts: 59
Location: USA
|
Posted: Sat Jun 05, 2004 8:09 pm Post subject: |
|
|
Tony, that's the Black Belt of filter strings.
I renamed my filters file and started a new one - then installed just yours and watched it go to work. WOW!
_________________
Perfection is sometimes sufficient... |
|
Back to top |
|
|
TonyKlein
Site Moderator
Joined: Oct 15, 2002
Posts: 5815
Location: Netherlands
|
Posted: Sun Jun 06, 2004 3:56 am Post subject: |
|
|
You're welcome; this is only one of my filters though, and it's somewhere near the bottom of my List so that an email will first pass through the more specific filters,.
And don't forget that the bodies/headers of many spam emails will often contain totally random email addies, and in such a case this filter will obviously not be very effective.
Also, remember to regularly check the Mail log in Statistics for False Positives; with this kind of filter you can never completely rule them out, and it will occasionally need to be tweaked in order not to flag valid emails.
_________________
Tony |
|
Back to top |
|
|
Ikeb
General
Premium Member
Joined: Apr 20, 2003
Posts: 3555
Location: Canada
|
Posted: Sun Jun 06, 2004 3:51 pm Post subject: |
|
|
Hi Tony.
Looks good but one quibble. Paul hasn't yet found a way to break up long strings without white spaces which ends up causing horizontal scrolling for the whole topic page. The only current way to fix this is to insert an extra white space (or CR) character every 80 characters or so. To fix this page requires the autor posting such a string to edit the post and add the white space.
BTW, I have my own "SPAMversized sites" filter that looks for specific domain names:
Code: |
[enabled],"SPAMversized link [B]","SPAMversized link",16711680,OR,Delete,TakesPrecedence,Body,
containsRE,"http://[^ ""<>]*?[.]?(greatnewmeds\.com|bbs-int\.com|downinme\.info|sd4d31\.com|
hsaae\.com|webtobox\.com|biz\.yahoo\.com/prnews/040412|supergregah\.biz|aw3ede\.com|ummrx\.com|
icore708tabs\.biz|medicalfhtjk\.com|thesedealzwontlast\.com|cabledeals\.biz|nmnxnie\.info|autaut\.biz|
8005hosting\.com|clicksandquotes\.com|activesaving\.com|bestemailfilter\.biz|name9865meds\.biz|
intenseschool\.com|brujing\.com|prabhums\.org|FaxMagk\.com|exmail\.info|lowestpricing\.biz|
mega-health\.net|svniejf\.info|iclick6203meds\.biz|bestpillsever\.com|3e44e\.com|stock989rx\.biz|
24hoursroadsidehelp\.com|z123eet\.info|ivoiremarketing\.com|goodsoftwarenow\.biz|selcydc\.com|
cared45\.com|popimpin\.com|solent1\.com|newmedformula\.com|seerc4mreds\.com|choice-is-yours\.com|
medz4cheap\.com|net-click\.net\.ph|slayinghungerbittennathan\.com|ss01\.net|tahrea\.com|
getcheapdrugs\.biz|eqmeds\.com|medsplanet\.info|askcare\.com|wimpygirls\.com|gsaq\.com|
cleetusolocastel\.com|perfectgreetings?\.com|host2\.biz|localatina\.com|ghkp\.us|uwkdbxd\.com|
wwwbargins\.biz|nowbetterthis\.biz|dealsforu\.biz|firstquote\.biz|greatweight\.biz|medsordernow\.biz|
ezprosoft\.biz|worldclassrx\.com)" |
This filter is intended to catch any SPAM which my other regex filters don't catch. Any SPAM which passes all other regex filter tests, I examine, find the SPAMversized site domain name and add it to the filter.
_________________
I like SPAM ... on my sandwich!
|
|
Back to top |
|
|
Eisenson
Corporal
Premium Member
Joined: May 22, 2004
Posts: 59
Location: USA
|
Posted: Sun Jun 06, 2004 5:24 pm Post subject: |
|
|
Does the complexity add any advantage? You can just do a simple filter that asks for the core of the various urls without the com, biz, org, etc.
medsplanet|askcare|wimpygirls|gsaq|getrich|
solocastel|perfectgreetings|
I wonder if it's safe to say that everything.biz is spam?
_________________
Perfection is sometimes sufficient... |
|
Back to top |
|
|
Ikeb
General
Premium Member
Joined: Apr 20, 2003
Posts: 3555
Location: Canada
|
Posted: Sun Jun 06, 2004 6:28 pm Post subject: |
|
|
I prefer to err on the false negative side. Dunno how likely a false positive might be if removing specific link tests.
WRT, marking all .biz domains as SPAM, keep in mind there are legit enterprises operating with such a domain .... CCSP for example.
_________________
I like SPAM ... on my sandwich! |
|
Back to top |
|
|
AlphaCentauri
Captain
Joined: Nov 20, 2003
Posts: 302
Location: USA
|
Posted: Tue Jun 08, 2004 12:21 pm Post subject: |
|
|
I have a mega list. I don't want to post it here, because it would take too long to insert the line breaks, but if anyone emails me privately, I'll send them.
I will add some shorter filters for patterns that occur even when the spammer changes URL (a URL filter lasts about a week). Remove "= ="
and paste it in your filters folder after closing mailwasher. I agree that filtering for strings like "click" and "medica" is too broad, because legitimate emails often have them. And if a string is too short, there is the risk it will randomly show up in the base 64 encoding of a photograph.
[enabled],patterns,zpatterns,255,OR,Delete,Body,containsRE,\d\dpill|=
=pill\d\d|\d\dmed|med\d\d|meds\d\d|medz\d\d|pills\d\d|pillz\d\d|=
=\D\DTABS\.US|\d\dbiz\.us,Body,containsRE,\d\dpharm|pharm\d\d|=
=\d\drx|rx\d\d|\d\dherb|herb\d\d|herbs\d\d|herbz\d\d,Body,=
=containsRE,\d\dhosting|medz\.biz|rx\.biz|meds\.biz|drugs\.biz,Body,=
=containsRE,\d\d\d.info|\d\d\.biz|\d\d\d.com|\d\dhotsing|\D\Drx\.us|=
=\ddrug\.us
[enabled],zViagra,zViagra,255,OR,Delete,Subject,containsRE,"[email protected]|=
=V.c0d1n|C1a\l1s|Val.ium|Xa.nax|Vi.agra|Am.bien|Phent.ermine|=
=V~i_c_o`din|Prescripti0n|Víagra|s\.e\.x|Ciali.s|Pe'nis|Vicodine|V.iagra|=
=Vic0din|Hydrocod0ne|V1agra|lev1tra|o'nline medica1|en1arge|G.ROWTH.H.ORMONE|Va1ium|V.icodin|preskription|=
=V1codin|Viagra|Xa.nax|Phen.ter.mine|Vi.agra|Val.ium|F.re.e shipping|Vicodin|Hydrocodone|Norco|Viag^^ra|vi@g\*\*r@|gen.eric|=
=medicin.es|viagr.a|Víagra|pum\.ps|XEN!C@L|C!@L!S|ULTR@M|FIOR!CET=
=|C!@L!S|V1AGR0|TR@M@D0L|pi\.l\.ls|Ci a lis|Generi\|c|Hydrocod0ne|Xa.na.x|Ph.en.ter.mine|Vi.ag.ra|Val.i.um|=
=V\+iagr\+a|C\+iali\+s|L\+evitr\+a|c=EDalis|V=EDagra|V.iagra|C.ialis|=
=L.evitra|Pr.escription|v1cdin|xnax|valiume|cial1s|C1-ALIS|LE-V1TRA|=
=D.o.c.t.o.r|V&i&a&g",Body,containsRE,"[email protected]|V.c0d1n|=
=C1a\l1s|Val.ium|Xa.nax|Vi.agra|Am.bien|Phent.ermine|V~i_c_o`din|=
=Prescripti0n|Víagra|s\.e\.x|Ciali.s|Pe'nis|Vicodine|V.iagra|Vic0din|=
=Hydrocod0ne|V1agra|lev1tra|o'nline medica1|en1arge|G.ROWTH.H.ORMONE|Va1ium|V.icodin|preskription|=
=V1codin|Viagra|Xa.nax|Phen.ter.mine|Vi.agra|Val.ium|F.re.e shipping|Vicodin|Hydrocodone|Norco|Viag^^ra|vi@g\*\*r@|gen.eric|=
=medicin.es|viagr.a|Víagra|pum\.ps|XEN!C@L|C!@L!S|ULTR@M|FIOR!CET|C!@L!S|V1AGR0|TR@M@D0L|pi\.l\.ls|Ci a lis|Generi\|c|Hydrocod0ne|Xa.na.x|Ph.en.ter.mine|Vi.ag.ra|Val.i.um|=
=V\+iagr\+a|C\+iali\+s|L\+evitr\+a|c=EDalis|V=EDagra|V.iagra|C.ialis|=
=L.evitra|Pr.escription|v1cdin|xnax|valiume|cial1s|C1-ALIS|LE-V1TRA|=
=D.o.c.t.o.r|V&i&a&g",Subject,containsRE," =?ISO",Subject,containsRE,"XEN!C@L|C!@L!S|ULTR@M|V1AGR0|=
=TR@M@D0L|p\.e\.ni\.s|Suck1ng|V\|agra|P=EBn=EFs|p=E9n=ECs|PH @ RM @|Via.gra|CIA^L1S LEV^ITRA|V1`AGRA|erect\I1e|erect\I\le| Vlagra|>ctors |O\.r\.d\.e\.r H\.e\.r\.e|\=EBnlarg\=EB|enl\=E2rgment|>nlarge<|>largem<|>gra, | Enlargeme<| Enlarg<|v\|@gra|V\|alium|V1codi\+n|Pntermi:n|>aceuticals|C\=CCalis|=
=V\=CCagra|C\.I\.A\.L\.I\.S|Gen.ric Vi.gra|Gfneric",Body,containsRE,"XEN!C@L|C!@L!S|ULTR@M|V1AGR0|=
=TR@M@D0L|p\.e\.ni\.s|Suck1ng|V\|agra|P=EBn=EFs|p=E9n=ECs|PH @ RM @|Via.gra|CIA^L1S LEV^ITRA|V1`AGRA|erect\I1e|erect\I\le| Vlagra|>ctors |O\.r\.d\.e\.r H\.e\.r\.e|\=EBnlarg\=EB|enl\=E2rgment|>nlarge<|>largem<|>gra, | Enlargeme<| Enlarg<|v\|@gra|V\|alium|V1codi\+n|Pntermi:n|>aceuticals=
=|C\=CCalis|=CCagra|C\.I\.A\.L\.I\.S|Gen.ric Vi.gra|Gfneric",Body,containsRE,"v1aGr@|A.t\|v@n|U\|`tr@m|L3v\|t'ra|=
=Pr0p3c'ia|Acyc\|0v`ir|Pr0z`@c|[email protected]|Bu:sp@r|Adip.ex|I0na:m\|n|=
=M3r.idia|X3n`ica\||Am:bi3n|S0.naTa|Fl3'xeril|Ce'\|3b\= rex|Fi0ric3`t|Tr:am@do\|"
Of course, if your email address is , better make sure everyone has you on their friends lists, because you'll trigger a lot of people's spam filters |
|
Back to top |
|
|
Ikeb
General
Premium Member
Joined: Apr 20, 2003
Posts: 3555
Location: Canada
|
Posted: Tue Jun 08, 2004 1:20 pm Post subject: |
|
|
AlphaCentauri wrote: |
I have a mega list. I don't want to post it here, because it would take too long to insert the line breaks, but if anyone emails me privately, I'll send them. |
At least add the breaks for the filters you do post, if you would be so kind. If Tony ever gets back here to fix his post, then your post will keep this topic in horizontal scroll mode.
_________________
I like SPAM ... on my sandwich!
|
|
Back to top |
|
|
AlphaCentauri
Captain
Joined: Nov 20, 2003
Posts: 302
Location: USA
|
Posted: Tue Jun 08, 2004 2:16 pm Post subject: |
|
|
I did add breaks. But once the thread is on a long screen, blanks won't cause a line break anymore. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum
|
Powered by phpBB 2.0.8a © 2001 phpBB Group
Version 2.0.6 of PHP-Nuke Port by Tom Nitzschner © 2002 www.toms-home.com
Version 2.2 by Paul Laudanski © 2003-2004 Computer Cops
|