Friday, July 25, 2003

Spam filter report: Following the examples of Hanah Metchis and Diana, I've been trying out the free open source Bayesian spam filter POPFile. Because it's a Bayesian filter it needs to be trained, so for the past month I've had POPFile sort all of my incoming mail into one of three groups or "buckets" -- "personal" (for genuine personal e-mail), "lists" (for bulk e-mail that I have voluntarily signed up for, such as mailing lists or announcements from certain vendors), and "spam" (all the unsolicited junk). After the training period, I then reset the counter statistics and let POPFile sort the next 1000 consecutive e-mails. To my dismay, my incoming mail consisted of 69% spam, 23% list mail, and only 8% "real" e-mail. Overall, POPFile did pretty well -- it classified over 96% of my mail correctly. Now I don't mind an occasional spam being misclassified as "personal", but it's more of a problem when the filter misclassifies a real e-mail as "spam", which still happens about once every 2-3 days. Since I don't want to take the chance of missing something important, I still take a look at my "spam" bucket once a day and quickly skim the subject lines before I delete the contents. But overall, I'm pretty pleased -- it's easy to set up, it does a surprisingly good job, and it has saved me a fair amount of daily aggravation. Plus you can't beat the price!