|
Donations |
|
|
|
|
|
If you found this site helpful, please donate to help keep it online
Don't want to use PayPal? Try our physical address
|
|
|
Survey |
|
|
|
|
|
|
|
|
Translate |
|
|
|
|
|
|
|
|
|
|
View previous topic :: View next topic |
Author |
Message |
gary
Lieutenant
Premium Member
Joined: Dec 22, 2002
Posts: 258
Location: Dallas/Ft. Worth, USA
|
Posted: Sat Jun 14, 2003 11:09 am Post subject: HowTo: MailWasher with POPFile (Bayesian Filter) |
|
|
[Updated for version 0.20]
NOTE: THIS IS NOT FOR BEGINNERS! If you are squeamish about tinkering with your computer, making file modifications, or trying out new programs, this is not for you.
I've seen some interest expressed about getting a Bayesian filter running along with MailWasher in these forums. I've also received a few e-mail inquiries. There are a number of good Bayesian filters out there, but you need one that acts as a proxy, so that narrows the field somewhat. The one that I happen to use is POPFile, because it is quite good, and it is written in Perl, so can easily be modified. The problem with most of the filters is that they will ignore the "TOP" command, which MailWasher uses to retrieve a limited number of message lines. Bayesian filters, like MailWasher CFS, need more than 20 lines to be effective.
You can work around the TOP with POPFile with a command line switch:
"C:\Program Files\POPFile\perl.exe" popfile.pl -pop3_toptoo 1
(You may need to alter the path to the Perl executable.)
Alternately, in version 0.20 you may go to the "Advanced" tab in the user interface, and change the value of "pop3_toptoo" to "1", and save it.
This switch will cause POPFile to issue both a "TOP" and a "RETR" command when it encouters a "TOP" request from MailWasher. The "TOP" results are fed to MailWasher, and the "RETR" is used for the Bayesian filter. This slows down mail processing somewhat, so if you get a lot of e-mail you might want to use it only if you have a high speed internet connection.
There is one other hack that will help speed things up, and will cause the messages to show in your POPFile history. POPFile itself contains a hack for a utility called Fetchmail, which uses the command "TOP x 99999999" to retrieve mail, instead of "RETR". It turns out that MailWasher often uses "TOP x 9999" for some reason, so we just add another hack! (Hey, this is Perl, and we all know that TMTOWTDI, right? )
The file POP3.pm in the "proxy" subdirectory contains the key. Modify line 330 from:
Code: |
if ( $2 ne '99999999' ) { |
to:
Code: |
if ( ($2 ne '99999999') && ($2 < 100) ) { |
Then modify line 421 from:
Code: |
if ( ( $command =~ /RETR (.*)/i ) || ( $command =~ /TOP (.*) 99999999/i ) ) { |
to:
Code: |
if ( ( $command =~ /RETR (.*)/i ) || ( $command =~ /TOP (.*) (.*)/i ) ) { |
...and you're good to go! Be careful to keep the upper case characters in the filename (POP3.pm).
Here are some more explicit, step-by-step instructions, if you prefer:
1) Download and install POPFile (popfile.sourceforge.net)
2) Go into each of your accounts in MailWasher with which you want to use POPFile and modify these settings on the "Incoming Mail" tab:
POP3 Server Address: localhost
User Name: <POP3 Server Address>:<e-mail login>
Okay, for those of you who are already confused, it's not that bad. Just replace whatever your POP3 server was with the word "localhost", which should map to your local machine, or "127.0.0.1" Then, for the user name, use what was in your POP3 Server Address, followed by a colon,then your user name. Easy, right?
3) Modify the startup links to POPfile in Start->Programs->Popfile to include the string " -pop3_toptoo 1" at the end of the "Target" path. (Alternately, in version 0.20 you may start POPFile, go to the "Advanced" tab in the user interface, and change the value of "pop3_toptoo" to "1", and save it.)
The following steps are optional, but will improve performance.
4) Go to your POPFile run directory (Usually C:\Program Files\POPFile) and find the directory "Proxy", which contains the file POP3.pm. Open POP3.pm for editing.
5) Change the following lines:
Modify line 330 from:
Code: |
if ( $2 ne '99999999' ) { |
to:
Code: |
if ( ($2 ne '99999999') && ($2 < 100) ) { |
Modify line 421 from:
Code: |
if ( ( $command =~ /RETR (.*)/i ) || ( $command =~ /TOP (.*) (.*)/i ) ) { |
to:
Code: |
if ( ( $command =~ /RETR (.*)/i ) || ( $command =~ /TOP (.*) 99999999/i ) || ( $command =~ /TOP (.*) 9999/i ) ) { |
6) Save the file (be sure to preserve the case in the file name "POP3.pm"), and start POPFile.
7) Make sure your "Spam Throttle" in MailWasher is set to 100 lines or greater. You need not worry about this if you are using CFS, since it will be set to 200 lines by default.
That's it! Remember that you will have to start the Web interface into POPFile and train it to be effective. You will also need to change the setup to suit your fancy. MailWasher can read the POPFile X field with a simple filter:
"The entire header" "Contains" "X-Text-Classification: spam"
Caveats and hints:
1) If you download a message more than once from MailWasher, it will show up twice in your history. Likewise, if you have your e-mail client configured to go through POPFile, and you download the message from MailWasher, and then your e-mail client, it will show up twice. This is not harmful, but please turn off the "Send statistics daily" option in the security tab, as it will skew the POPFile folks' statistics to look better than they actually are.
2) If a message is in the history more than once, and it is not correctly classified, you need only reclassify one copy of the message.
Let me know if you have trouble understanding these directions, and I'll try to clarify them.
_________________
Gary
Last edited by gary on Sun Oct 19, 2003 11:00 am, edited 8 times in total
|
|
Back to top |
|
|
Ikeb
General
Premium Member
Joined: Apr 20, 2003
Posts: 3488
Location: Canada
|
Posted: Sat Jun 14, 2003 12:06 pm Post subject: |
|
|
Thank you Gary!! I'll be trying this out ASAP when I have some time and my high speed link is back.
Would you mind publishing these directions at your web site? I tend to lose track of which thread has what info.
_________________
I like SPAM ... on my sandwich! |
|
Back to top |
|
|
gary
Lieutenant
Premium Member
Joined: Dec 22, 2002
Posts: 258
Location: Dallas/Ft. Worth, USA
|
Posted: Sat Jun 14, 2003 1:03 pm Post subject: |
|
|
Sure, I'll publish it on my site. I'll drop a note in here when it's done.
Is there any interest in using POPFile to get to Yahoo or Hotmail? It requires using something like MrPostman, Izymail, or Web2POP, and daisy-chaining the proxies. It's a little more involved, but not terribly complicated.
_________________
Gary |
|
Back to top |
|
|
TalonTSi
Corporal
Joined: Mar 16, 2003
Posts: 55
Location: Canada
|
Posted: Sat Jun 14, 2003 3:33 pm Post subject: |
|
|
Works like a charm Gary, thanks!
A couple of comments/clarifications for your documentation that I discovered while following your directions:
1. the POP3.pm filename in the proxy folder is case-sensitive (POPfile won't start if you accidentally rename it to pop3.pm)
2. there should be an extra space before the last "{" in your line modifications. Doesn't matter, except a copy/paste into a Find dialog won't find it.
3. your reference to line 338 was actually line 337 in my POP3.pm file (POPfile v0.19.0)
Thanks again for the great documentation! I've wanted to pair a Bayesian filter with Mailwasher for a long time. Tried K9, but it didn't work. This is exactly what I was looking for!
_________________
--Darren. |
|
Back to top |
|
|
gary
Lieutenant
Premium Member
Joined: Dec 22, 2002
Posts: 258
Location: Dallas/Ft. Worth, USA
|
Posted: Sat Jun 14, 2003 4:01 pm Post subject: |
|
|
Quote: |
1. the POP3.pm filename in the proxy folder is case-sensitive (POPfile won't start if you accidentally rename it to pop3.pm) |
Noted! Thanks!
Quote: |
2. there should be an extra space before the last "{" in your line modifications. Doesn't matter, except a copy/paste into a Find dialog won't find it. |
I could not get the extra spaces to appear in the posting. I guess the system does some sort of "cleanup" on the message - I even tried turning off the HTML, Smilies and BBCode, but it didn't have any effect. Sorry!
Quote: |
3. your reference to line 338 was actually line 337 in my POP3.pm file (POPfile v0.19.0) |
Oops! Erm, I guess I didn't have enough caffeine this morning. Fixed!
Let me know what you think of the filter! I've been having good luck with it. You can also pair POPFile up with your e-mail client, and it will classify your messages for you in different categories - not just "spam" and "accept". If you use Outlook, there is a plugin called "Outclass" that uses POPFile to filter the messages after Outlook downloads them (http://www.vargonsoft.com/Outclass/download.aspx). And it's all free!
A future revision will have expiring keywords, for those guys that fill their spam with nonsense text to get around Bayesian filters.
Glad I could help, and thanks for your corrections!
_________________
Gary
|
|
Back to top |
|
|
TalonTSi
Corporal
Joined: Mar 16, 2003
Posts: 55
Location: Canada
|
Posted: Sat Jun 14, 2003 4:10 pm Post subject: |
|
|
gary wrote: |
I could not get the extra spaces to appear in the posting. I guess the system does some sort of "cleanup" on the message - I even tried turning off the HTML, Smilies and BBCode, but it didn't have any effect. |
I see what you mean. The only way I can find to get spaces to show up is to use the Code HTML option.
_________________
--Darren.
|
|
Back to top |
|
|
Guest
Guest
|
Posted: Sat Jun 14, 2003 10:38 pm Post subject: |
|
|
When I change my inbound settings in the new version of MWP as specified above, I get the same error I was having with the previous version - server intentionally closed - this happens on every mail check, just like it did before. I really want to use the MWP with POPfile - but looks like with ATTBI it is a no go. Maybe it will work when I am officially switched to Comcast at the end of the month. |
|
Back to top |
|
|
gary
Lieutenant
Premium Member
Joined: Dec 22, 2002
Posts: 258
Location: Dallas/Ft. Worth, USA
|
Posted: Sun Jun 15, 2003 2:32 am Post subject: |
|
|
Are you internal to ATTBI's network, or external? I access it both ways, but I can't use POPFile externally, since a connection external to their network requires SSL.
_________________
Gary |
|
Back to top |
|
|
Guest
Guest
|
Posted: Sun Jun 15, 2003 11:58 am Post subject: |
|
|
Well I must be internal since I do not require SSL. |
|
Back to top |
|
|
Guest
|
Posted: Sun Jun 15, 2003 12:55 pm Post subject: |
|
|
Gary,
Excellent how-to and it made me head out, get POPfile and install it
Just one thing that is not yet clear to me : the POP3.pm hacking in the proxy directory, does that replace the
"C:\Program Files\POPFile\perl.exe" popfile.pl -pop3_toptoo 1"
change, or should/can both be done? |
|
Back to top |
|
|
Guest
Guest
|
Posted: Sun Jun 15, 2003 1:00 pm Post subject: |
|
|
Well it seems to be working today - I am not gettng the error message. I would guess that it was user error - perhaps I did not have POPFile started when I was checking mail.
However, I am not able to get MWP and POPFile working together. My email program does not support the X-Test-Classification header. So I am just using Subject modification into the buckets I want. Without MWP, it works fine. So I must be doing something wrong. I created an MWP filter that as follows "Subject" "contains " "[spam]" and put that as my first filter. But it is ignored. Any ideas? |
|
Back to top |
|
|
gary
Lieutenant
Premium Member
Joined: Dec 22, 2002
Posts: 258
Location: Dallas/Ft. Worth, USA
|
Posted: Sun Jun 15, 2003 3:18 pm Post subject: |
|
|
Quote: |
Just one thing that is not yet clear to me : the POP3.pm hacking in the proxy directory, does that replace the
C:\Program Files\POPFile\perl.exe" popfile.pl -pop3_toptoo 1
change, or should/can both be done? |
The "-pop3_toptoo 1" allows MW's TOP entries to be processed, but then POPFile makes a subsequent RETR call, and downloads the entire e-mail for its own processing. 9999 lines seemed like enough lines for proper classification to me, so the hack basically stops the subsequent RETR call from being done, and tells POPFile to use the 9999 line TOP request. So, you must use the "-pop3_toptoo 1" parameter, but if you do the hack in addition to the parameter, you will have faster e-mail processing.
I don't know why MW often makes a "TOP x 9999" call, but we might as well use it to our advantage!
Quote: |
However, I am not able to get MWP and POPFile working together. My email program does not support the X-Test-Classification header. So I am just using Subject modification into the buckets I want. Without MWP, it works fine. So I must be doing something wrong. I created an MWP filter that as follows "Subject" "contains " "[spam]" and put that as my first filter. But it is ignored. Any ideas? |
The filter that I use looks like this:
"The entire header" "contains" "X-Text-Classification: spam"
Give that a try. Here is the line from filters.txt, if you prefer to paste it:
[enabled],"POPFile Spam","POPFile Spam",255,AND,Blacklist,Delete,EntireHeader,contains,"X-Text-Classification: spam"
I can put this in the next release of the MW Sample Filters, if there is enough interest.
_________________
Gary
|
|
Back to top |
|
|
Guest
Guest
|
Posted: Sun Jun 15, 2003 4:01 pm Post subject: |
|
|
My email program does not support using the X-Test-Classification header. If I use the Subject header in POPFile, POPFile marks it correctly and passes it on to my email program correctly and puts it in the correct folder. But I do not see MWP involved in this process - it will mark the email according to the filters already there - not to the newly created filters for POPFile. BTW, I did try the X-Test-Classification filter as you suggested, even though it won't work with my email program just to see - and same result - it is ignored by MWP. |
|
Back to top |
|
|
Guest
|
Posted: Mon Jun 16, 2003 5:50 am Post subject: |
|
|
I had the same problem - with the filtering that is - and solved it. You added a new filter for POPfile, but it's probably at the bottom of your filter list. Move it to the top and see if MWP uses it then. |
|
Back to top |
|
|
Guest
Guest
|
Posted: Mon Jun 16, 2003 9:24 pm Post subject: |
|
|
Actually I did the POPFile filter to the top of my list - so that isn't the problem. I think I must be doing something else wrong but just can'f figure out what it is. And as I think about it, my setup as it is now should not work. I have POPFile running the background but it does not apparently attach tags until my email program is called. Thus MWP is left completely out of the loop - it is filtering email before POOPFile attaches the tags so the filters I have for POPFile would be ignored. What am I missing here? |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum
|
Powered by phpBB 2.0.8a © 2001 phpBB Group
Version 2.0.6 of PHP-Nuke Port by Tom Nitzschner © 2002 www.toms-home.com
Version 2.2 by Paul Laudanski © 2003-2004 Computer Cops
|