Brian's Waste of Time

Sun, 25 Apr 2004

Better Spam Filtering

The cost of spam is not server resources used to process it. The cost of spam is mistakes made because users have to interact with it. Cost of server resources is easier to calculate, so you hear about it more often. The fact that every user I know prefers to have false-negatives, and have every positive still be delivered supports this somewhat. Spam volume is increasing to the point where this is getting difficult to sustain, hoewever.

Server resources are nothing compared to user resources. An array of a couple dozen spam filter servers consisting of the p2-450's from a couple upgrade cycles ago is enough computing power to carefully analyze every email coming into a typical system. (If you run GMail, Hotmail, or Yahoo! Mail, you are welcome to ignore this argument). If you are willing to accept this, then we can change the problem constraints to require perfect reliability (no message gets lost) and optimal classification, while allowing high computing resources (cpu & memory).

What I want is to take the expert system concepts that have been kicking around for a long time now and apply them to spam filtering. I suspect this has been ignored thus far as these tend to be resource intensive -- applying an expert system that can request and evaluate tests, then recur, on every single smtp message entering the system makes most admins start laughing. Stop laughing. The cost to your users for this spam is far higher than the cost of the processing cycles unless you are a spam magnet (the aforementioned spam^H^H^H^H email providers).

I think this is my next big itch =) It is a shame Jess is not free, and that the prices aren't even available. Drools is though =)

2 writebacks [/tech] permanent link