Quantity vs. Quality in Security Software Testing

We know how to make better tests of security utilities, but there's probably not enough demand for such tests in the market.

The testing of anti-virus products has always been a tricky business. The better actors in that business are improving their standards, and this presents a challenge in testing.

As I described recently, the most famous testing standards these days are bankrupt in terms of their value to users. The WildList and the VB100 tests which use it exist simply out of ancient tradition and a perceived marketing need not to rock the boat. But these traditions are beginning to break down, and innovative new testing will improve not only what malware is tested against the product, but how it is tested. Some companies, like Trend Micro, have the courage to dump VB100, while others, like Symantec, are "'absolutely' committed to the VB100."

Most anti-malware testing innovation these days, I don't know why, happens in Germany. Andreas Marx of AV-Test not only develops and performs cutting-edge testing, but he has written on the subject extensively. Look for the "AVAR 2007 Conference - Seoul, South Korea" section of AV-Test's Papers page. His paper "Testing of 'Dynamic Detection'" discusses many of the challenges facing testers of anti-malware products in order to meet all that is expected of them, and not just in terms of efficacy.

The old way to look at things focused on quantity. A big part of this emphasis comes from magazines, which fund a lot of the testing. AV-Test specialized in this approach, and could test hundreds of thousands of samples against many products. There is a newer imperative to quantity too: The number of different malware samples in the wild has skyrocketed in recent years. The corresponding number of updates from anti-malware vendors has also skyrocketed.

But the highly automated nature of this testing can compromise some important objectives. For instance, using virtual machines can greatly speed up mass testing, but some malware actually checks to see if it's running in a virtualized environment and changes its behavior, perhaps presuming that in such a case it is being run by a researcher. The ultimate, most accurate environment in which to test an anti-malware product is, unsurprisingly, the sort of normal, Internet-connected PC on which the product and malware are designed to run.

Anti-malware vendors have told me that the future of this testing has to be on an online PC as opposed to an isolated network (or a non-networked PC as is done with WildList testing). Many of these products check back with their own sites in the normal course of operation, not just for signature updates, but for Web site blacklists and whitelists.

Another big problem, and a major focus of the "Dynamic Detection" paper, is heuristic detection. I've been involved in attempts to benchmark heuristic detection, and let me assure you it's a very difficult problem. It requires sophisticated lab procedures and lots of preparation time. How do you know, for example, if a sample is blocked, that it was blocked for behavioral reasons and not for matching a signature? AV-Test does this by freezing product installations for a time, blocking them from updating themselves. The downsides to this are that you are testing what could be an old version and that heuristic detection of the samples you get at that point in time may not match up well against next week's samples. What can you do? (That was a rhetorical question.)

But the real problem is that when you test on a real PC in a real environment and you check the things you want to check, such as whether a sample was blocked by a simple signature or by a behavior check or by a URL filter or some other mechanism, you can't automate everything the way you can with a simple file scanning test. Marx tells me that "we cannot test more than about 50 samples in two days, with two persons in place" even though they have written numerous tools to assist the processes. It's not worth running tests at this speed unless someone is paying for them, and, for now, those who pay would rather see massive numbers of samples superficially scanned rather than a much smaller number of real threats tested in a realistic environment.

These are classic testing tradeoffs I recognize from testing many different kinds of products. Security software is about a whole lot more than scanning files now, and the old style of testing doesn't help users to determine which products are best. We know what to do, but I think inertia is stronger than logic in this market.

Security Center Editor Larry Seltzer has worked in and written about the computer industry since 1983.

For insights on security coverage around the Web, take a look at eWEEK.com Security Center Editor Larry Seltzer's blog Cheap Hack.