Computer Cops - Content Filtering And Machine Learning

Employing Multiple Anti-Spam Strategies
Content Filtering And Machine Learning
By InformationWeek

E-mail administrators can easily add blacklist query support to popular message transfer agents such as Sendmail, Qmail, and NTmail. Products such as GFI Software Ltd.'s MailEssentials for Exchange add blacklist support to Exchange Server, and there are also numerous add-ons available for client e-mail programs such as Microsoft Outlook that allow blacklists to be used by end users. Microsoft and Lotus have blacklist support in the forthcoming Exchange Server 2003 and Lotus Domino 6.

Blacklists alone, however, aren't an effective means of blocking spam. Originating IP addresses can be spoofed, rendering a blacklist lookup ineffectual. IP addresses can also be hijacked or used by more than one person at a time. Blocking that IP address may hurt legitimate users, which can have serious implications for due process, says Silliman. There are also over 150 blacklists available, varying in quality, so there isn't one guaranteed definitive, up-to-date blacklist. Most businesses subscribe to multiple blacklists to bolster their effectiveness.

We use 11 different blacklists, says Tom O'Neal, CEO of Texas American Communications Network Inc., an ISP that uses Qmail as its message transfer agent. O'Neal uses the popular open-source Spam Assassin package as its primary anti-spam defense, as well as whitelisting based on customer feedback.

Whitelists are lists of known IP addresses and domain names that a business or user knows are valid sources. Typically, lists of whitelists are maintained by end users and can be pushed to the anti-spam software running on the e-mail server. The whitelist system used by Hotmail e-mail account users (initiated by clicking on the this is not junk mail button in the user's junk-mail folder) is a typical example.

Content Filtering And Machine Learning
Content filtering, the filtering of e-mail based on the contents of the header and body of the e-mail message, has been around almost as long as there has been e-mail, but newer techniques involving rule-based reasoning have greatly improved the use of filtering as a defense against spam.

The widely imitated open-source Spam Assassin anti-spam tool is quite good at recognizing spam by applying a variety of rules with positive or negative scores attached to each rule. A rule that generates a positive score might look inside the e-mail message for variations of the word Viagra along with the word sale which might indicate an Internet spam message selling Viagra. A rule generating a negative score might look for informationweek.com, in the e-mail message header, indicating a desirable e-mail message that is unlikely to be spam. After applying all the rules, the e-mail message is rated by totaling up the scores and checking to see if the total score exceeds some threshold defined by the recipient or the e-mail administrator.

The total score, sometimes called a confidence level, has become an important number. Most anti-spam tools assign a similar numeric score to an e-mail message after analyzing it, and place that number in the e-mail message by adding an extra header to the message. E-mail clients, or other types of anti-spam software, can then access that number and decide what to do with the e-mail message. Microsoft's forthcoming Exchange Server 2003 doesn't actually contain any new filtering capabilities but instead has an improved API that lets third-party anti-spam software analyze incoming e-mail messages and insert a confidence level into the e-mail message. Outlook 2003 can then read the confidence level in the received message and decide whether to flag the message as spam. Similarly, Eudora Pro 6 can read the score inserted by several anti-spam software packages.

SecurityPipeLine


	Home · Topics · Submit News · Top 10 This entire site, Cops Themes, and Computer Cops are © 2002 - 2004 Computer Cops, LLC. All rights reserved. You can syndicate our news using the file RSS 0.91, ultramode.txt, or RSS 1.0. Acceptable Use Policy. Use signifies your agreement. Engine Copyright © 2002 by PHP-Nuke, GNU/GPL Licensed. ICRA Member. Paul Laudanski, Member of Computer Cops, LLC Server Load: 1392 pages served in previous 5 minutes. Page Generation: 0.566 seconds.