New User? Need help? Click here to register for free! Registering removes the advertisements.

Computer Cops
image image image image image image image image
Donations
If you found this site helpful, please donate to help keep it online
Don't want to use PayPal? Try our physical address
image
Prime Choice
· Head Lines
· Advisories (All)
· Dnld of the Week!
· CCSP News Ltrs
· Find a Cure!

· Ian T's (AR 23)
· Marcia's (CO8)
· Bill G's (CO11)
· Paul's (AR 5)
· Robin's (AR 2)

· Ian T's Archive
· Marcia's Archive
· Bill G's Archive
· Paul's Archive
· Robin's Archive
image
Security Central
· Home
· Wireless
· Bookmarks
· CLSID
· Columbia
· Community
· Downloads
· Encyclopedia
· Feedback (send)
· Forums
· Gallery
· Giveaways
· HijackThis
· Journal
· Members List
· My Downloads
· PremChat
· Premium
· Private Messages
· Proxomitron
· Quizz
· RegChat
· Reviews
· Google Search
· Sections
· Software
· Statistics
· Stories Archive
· Submit News
· Surveys
· Top
· Topics
· Web Links
· Your Account
image
CCSP Toolkit
· Email Virus Scan
· UDP Port Scanner
· TCP Port Scanner
· Trojan TCP Scan
· Reveal Your IP
· Algorithms
· Whois
· nmap port scanner
· IPs Banned [?]
image
Survey
How much can you give to keep Computer Cops online?

$10 up to $25 per year?
$25 up to $50 per year?
$10 up to $25 per month?
$25 up to $50 per month?
More than $50 per year?
More than $50 per month?
One time only?
Other (please comment)



Results
Polls

Votes: 942
Comments: 19
image
Translate
English German French
Italian Portuguese Spanish
Chinese Greek Russian
image
 Forum FAQForum FAQ   SearchSearch   UsergroupsUsergroups   ProfileProfile   Login to check your private messagesLogin to check your private messages   LoginLogin 

Need help with regex filter.
Goto page 1, 2, 3, 4, 5, 6  Next
 
Post new topic   Reply to topic       Computer Cops Forum Index -> Mailwasher - Troubleshooting / General
View previous topic :: View next topic  
Author Message
Cowboy

Guest






PostPosted: Sat Nov 22, 2003 6:49 am    Post subject: Need help with regex filter.
Reply with quote

I need a filter that will weed out comments placed in the middle of a word.

It would delete:
Buy my delicious sp<!kfgkh8899>am.

But it would not delete:
Buy my delicious <!kfgkh8899>spam.

I can not write the filter for myself, so if someone could help me with this it would do a lot for my spam filtering.

Thanks! Very Happy
Back to top
stan_qaz

General
General
Premium Member
Premium Member


Joined: Mar 31, 2003
Posts: 4099
Location: USA

PostPosted: Sat Nov 22, 2003 1:09 pm    Post subject:
Reply with quote

Go to the search function and select the firetrust catagory and search on html, you will find plenty of discussion of the topic and several suggestions for filters.
Back to top
View users profile Send private message Visit posters website
Cowboy

Guest






PostPosted: Sat Nov 22, 2003 2:44 pm    Post subject:
Reply with quote

There is no filter like I need. At least not that I can find.
The closest is the filter that counts the comments but I think that is not what I need.
Back to top
stan_qaz

General
General
Premium Member
Premium Member


Joined: Mar 31, 2003
Posts: 4099
Location: USA

PostPosted: Sat Nov 22, 2003 5:26 pm    Post subject:
Reply with quote

That is as good as it is going to get, the problem isn't easy to solve as you saw from the posts you looked at.

Are you willing to pay to have a filter written? Make an offer and see if someone is willing to tackle the problem for some cash.

If not chip into the threads asking for a processed message option for the filters, the fix I like best.
Back to top
View users profile Send private message Visit posters website
Cowboy

Guest






PostPosted: Sat Nov 22, 2003 7:59 pm    Post subject:
Reply with quote

Nonsense. It can not be as good as it gets until someone tries to write the filter. Nobody has tried yet!
Back to top
denn988

Guest






PostPosted: Sat Nov 22, 2003 9:31 pm    Post subject:
Reply with quote

Cowboy wrote:
Nonsense. It can not be as good as it gets until someone tries to write the filter. Nobody has tried yet!


So....Why don't you try???
(?# finds words broken by html comments )[a-z](<[!/].*?>)[a-z]
You might find it to be easier than you thought possible...
Back to top
Ikeb

General
General
Premium Member
Premium Member


Joined: Apr 20, 2003
Posts: 3483
Location: Canada

PostPosted: Sun Nov 23, 2003 1:12 am    Post subject:
Reply with quote

Cowboy wrote:
Nonsense. It can not be as good as it gets until someone tries to write the filter. Nobody has tried yet!

Ride 'em cowboy! Razz

Shoot first, ask questions later! Rolling Eyes

_________________
I like SPAM ... on my sandwich!
Back to top
View users profile Send private message Send email
Cowboy

Guest






PostPosted: Sun Nov 23, 2003 7:47 am    Post subject:
Reply with quote

I have tried. I've read the help files. I've tried to put together the parameters to make such a filter. I've sat for hours trying everything I can think of. Never once did I get it to work. So I decided I needed help. All I got was a bunch of comments about as useful as my filter attempts.

In the best of worlds I would have gotten a "That's a good idea to filter out just html comments that are used to disguise words, instead of trying to count the comments. Here's your filter.", or I would get a "That's a bad idea because it's impossible to make such a filter." Or at least not a thread bogged down by the assume patrol.
Back to top
denn988

Guest






PostPosted: Sun Nov 23, 2003 8:29 am    Post subject:
Reply with quote

As long as you have already tried....

See if this will help:

The body....
contains Regular Expr...
Quote:

(?# words broken by html comments )[a-z]<[!/][^<]*?>[a-z]


Anyone who wishes can make any improvments to the filter as they see fit. This is just the simplest version that would seem to work.

If you decide you have to AUTO-DELETE based on this, don't blame me if you loose a few legitimate mails.
Back to top
denn988

Guest






PostPosted: Sun Nov 23, 2003 8:11 pm    Post subject:
Reply with quote

Cowboy,

I have had a day to see how the above filter works and it looks pretty good so far.

There are a couple of mods that I have made to it that have improved its trap rate.

Change the above RegExp to:

Quote:

(?# words broken by html comments )[a-z]<[^<]*?>[a-z]


I removed the '!/' from the filter, so it will trap any word that has the html brackets in between the letters.

Examples:

s<!tytyt>pam
sp<wretser>am
sp</font>am

are trapped

this </font>is a test

is NOT trapped.

This filter can still result in false positives, so don't auto-delete.
Back to top
Ikeb

General
General
Premium Member
Premium Member


Joined: Apr 20, 2003
Posts: 3483
Location: Canada

PostPosted: Mon Nov 24, 2003 1:54 am    Post subject:
Reply with quote

Denn988, thanks for another good one.

Do you think it's OK to have the filter fire on a single hit?

Also, instead of the [^<] negation, why not use [^>] since it's the closing ">" bracket that will follow this part of the match?

_________________
I like SPAM ... on my sandwich!
Back to top
View users profile Send private message Send email
denn988

Guest






PostPosted: Mon Nov 24, 2003 8:05 am    Post subject:
Reply with quote

Ikeb wrote:
Denn988, thanks for another good one.

Do you think it's OK to have the filter fire on a single hit?

Also, instead of the [^<] negation, why not use [^>] since it's the closing ">" bracket that will follow this part of the match?


First...

I don't think it would be a good idea to write this type of filter to look for multiple hits. The reason is that is starts with a wildcard ([a-z]). If you were to write the filter to continue looking for more than one instance it would require a lot of CPU time to do each iteration, and with the 'a-z' at the beginning it would do it for each charactor in the message.

That would probably cause the filter to be more time intensive that you would consider acceptable.


Second...

As to the '[^<]' in the Regex...

It is there to prevent the filter from trapping a situation where there are two opening brackets prior to a closing bracket.

Example:

10<20<30
30>20>10

The above is NOT html, but represent two valid mathematical expressions.

You don't want the filter to trap on something like that.


Before you ask....

You could have another rule in the filter that looks for a "Content-Type: text/html"....but it would be something of a useless rule. There would be no easy way to write the filter so that it would only look at the message part that was HTML, in those cases that were multipart messages.

Anything that you would try to do with regex to try to do that would be even more CPU intensive than the 'multi-hit' filter would be.
Back to top
Ikeb

General
General
Premium Member
Premium Member


Joined: Apr 20, 2003
Posts: 3483
Location: Canada

PostPosted: Mon Nov 24, 2003 9:45 am    Post subject:
Reply with quote

denn988 wrote:
I don't think it would be a good idea to write this type of filter to look for multiple hits. The reason is that is starts with a wildcard ([a-z]). If you were to write the filter to continue looking for more than one instance it would require a lot of CPU time to do each iteration, and with the 'a-z' at the beginning it would do it for each charactor in the message.

That would probably cause the filter to be more time intensive that you would consider acceptable.


Second...

As to the '[^<]' in the Regex...

It is there to prevent the filter from trapping a situation where there are two opening brackets prior to a closing bracket.

Example:

10<20<30
30>20>10

The above is NOT html, but represent two valid mathematical expressions.

You don't want the filter to trap on something like that.

OK thanks for the clarification.

denn988 wrote:
Before you ask....

You could have another rule in the filter that looks for a "Content-Type: text/html"....but it would be something of a useless rule. There would be no easy way to write the filter so that it would only look at the message part that was HTML, in those cases that were multipart messages.

Anything that you would try to do with regex to try to do that would be even more CPU intensive than the 'multi-hit' filter would be.

You give me too much credit! I hadn't thought of attempting to check the html parts only. Besides I think the math expressions you gave as examples could also occur with HTML messages.

_________________
I like SPAM ... on my sandwich!
Back to top
View users profile Send private message Send email
denn988

Guest






PostPosted: Mon Nov 24, 2003 10:13 am    Post subject:
Reply with quote

Quote:
You give me too much credit! I hadn't thought of attempting to check the html parts only. Besides I think the math expressions you gave as examples could also occur with HTML messages.


Those examples would look totally different if they appeared in an HTML part than they would in a Plain Text part.

Those examples, if sent as HTML, would appear in the raw text as:

10<20<30

and

30>20>10

The brackets must be sustituted when converting them to the HTML raw text in order to keep the translator from being confused.
Back to top
Guest








PostPosted: Mon Nov 24, 2003 10:22 am    Post subject:
Reply with quote

Sorry,

I forgot to turn th e HTML off when I posted

Those examples, if sent as HTML, would appear in the raw text as:

1 0 & l t ; 2 0 & l t ; 3 0

and

3 0 & g t ; 2 0 & g t ; 1 0

I had to place spaces between each charactor above to get them to post.

The brackets must be sustituted when converting them to the HTML raw text in order to keep the translator from being confused.[/quote]
Back to top
Display posts from previous:   
Post new topic   Reply to topic       Computer Cops Forum Index -> Mailwasher - Troubleshooting / General All times are GMT - 5 Hours
Goto page 1, 2, 3, 4, 5, 6  Next
Page 1 of 6

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB 2.0.8a © 2001 phpBB Group

Version 2.0.6 of PHP-Nuke Port by Tom Nitzschner © 2002 www.toms-home.com
Version 2.2 by Paul Laudanski © 2003-2004 Computer Cops