How to Test Enterprise Spam Defenses?

 
 
By Larry Seltzer  |  Posted 2003-09-04 Email Print this article Print
 
 
 
 
 
 
 

How many licks does it take to get to the Tootsie Roll center of a Tootsie Pop? As with the number of false-positive results generated by enterprise anti-spam products, writes Security Center Editor Larry Seltzer, the world may never know.

There are many measures of a spam-blocking product, but probably the most important one is the number of false positives it generates. False positives are non-spam e-mails that the product mistakenly classifies as spam. They represent the most important failure of a product because the more of them there are, the less you can trust the spam blocking. But testing products for their false positives is difficult. As I said in a recent column, you dont really know if you have false positives (or how many there are) unless you go through the blocked mail and count the ones that shouldnt have been blocked. Some of you disagree with me on this, but Im unmoved on the subject.

Its even harder when youre trying to do an independent lab test, which happens to be my line of work. I have been designing and implementing software tests for about 16 years now, and testing enterprise spam blockers is clearly the most difficult problem I have ever encountered. Ive developed many tests that simply took a lot of work, but this ones over the top.

Its not as bad testing consumer anti-spam products. My usual test plan, which Ive used in the past for PC Magazine, is to turn one of my legit accounts into a forwarding account. It forwards to a series of test accounts. Ill usually collect about 200 messages for training purposes and then about 1,000 for the actual test.

After I train and then run the 1,000 messages through the filter, I examine the blocked mail for false positives and the non-blocked mail for spam. Because this is my mail and the numbers arent too huge, I can handle the process manually. Assuming Im consistent from product to product, this method is accurate.

But how should one test an enterprise product? Is 1,000 messages, all to the same e-mail address, enough? As a test designer, what I would like to do is to run half-a-million messages through them. I want the message base to include large numbers of threads involving group addresses and people inside and outside the organization. A number of problems make this impractical for a test situation.

First, thats a lot of messages for a lab analyst to examine, and someones going to have to examine them manually. Its just too much work. Manual analysis doesnt scale.

The second big problem is where to get a large enough collection of real mail to mix with the spam. This is yet another example of a problem you might think is easy to fix but which is actually quite problematic. We need thousands of messages typical of a corporate mail database. Perhaps youd like to volunteer your own corporations e-mail for our testers to examine as part of our benchmarks? I didnt think so.

As it turns out, even people at PC Magazine arent comfortable with using their own company e-mail for this purpose, and I dont blame them; it should be confidential. And remember that the interesting messages in this test are the ones coming from outside the test organization, so if you use real mail, that means youre using messages from third parties, probably without their consent. How fair is that?

My best theoretical solution to this problem is to take messages from moderated newsgroups (such as biz.com.accounting) and change the users into fake users from a fictional corporate directory and a fictional directory of outsiders. I would tag these good messages with a custom header so that they can be recognized in the post-filtering message database.

I have some other theories for how to construct a semi-synthetic benchmark that would mix this database of known good mail with another database of current spam, but it involves some moderately complicated programming. If it all works, at the end I should have spam-filtered mail on which I can programmatically count false positives and negatives. This should scale.

Ive been talking to other people with a lot of experience in test development, in the anti-spam business and elsewhere. I havent yet found a good implementation of a test that would provide reliable and repeatable results across multiple vendors. I think well get there because it would be an invaluable tool for corporate IT buyers. In the meantime, however, you have to have the right perspective on any benchmarks you see; we dont know enough about how well these products work.

Security Center Editor Larry Seltzer has worked in and written about the computer industry since 1983.

More from Larry Seltzer
 
 
 
 
Larry Seltzer has been writing software for and English about computers ever since—,much to his own amazement—,he graduated from the University of Pennsylvania in 1983.

He was one of the authors of NPL and NPL-R, fourth-generation languages for microcomputers by the now-defunct DeskTop Software Corporation. (Larry is sad to find absolutely no hits on any of these +products on Google.) His work at Desktop Software included programming the UCSD p-System, a virtual machine-based operating system with portable binaries that pre-dated Java by more than 10 years.

For several years, he wrote corporate software for Mathematica Policy Research (they're still in business!) and Chase Econometrics (not so lucky) before being forcibly thrown into the consulting market. He bummed around the Philadelphia consulting and contract-programming scenes for a year or two before taking a job at NSTL (National Software Testing Labs) developing product tests and managing contract testing for the computer industry, governments and publication.

In 1991 Larry moved to Massachusetts to become Technical Director of PC Week Labs (now eWeek Labs). He moved within Ziff Davis to New York in 1994 to run testing at Windows Sources. In 1995, he became Technical Director for Internet product testing at PC Magazine and stayed there till 1998.

Since then, he has been writing for numerous other publications, including Fortune Small Business, Windows 2000 Magazine (now Windows and .NET Magazine), ZDNet and Sam Whitmore's Media Survey.
 
 
 
 
 
 
 

Submit a Comment

Loading Comments...

 
Manage your Newsletters: Login   Register My Newsletters























 
 
 
 
 
 
 
 
 
 
 
Rocket Fuel