We know how to make better tests of security utilities, but there's probably not enough demand for such tests in the market.The testing of anti-virus products has always been a tricky business. The
better actors in that business are improving their standards, and this presents
a challenge in testing.
As
I described recently, the most famous testing standards these days are
bankrupt in terms of their value to users. The WildList and the VB100 tests
which use it exist simply out of ancient tradition and a perceived marketing
need not to rock the boat. But these traditions are beginning to break down,
and innovative new testing will improve not only what malware is tested against
the product, but how it is tested. Some companies, like Trend Micro, have the
courage to dump VB100, while others, like Symantec, are "'absolutely' committed to the VB100."
Most anti-malware testing innovation these days, I don't know why, happens
in Germany. Andreas
Marx of AV-Test not only develops and
performs cutting-edge testing, but he has written on the subject extensively.
Look for the "AVAR 2007 Conference - Seoul,
South Korea" section
of AV-Test's
Papers page. His paper "Testing
of 'Dynamic Detection'" discusses many of the challenges facing
testers of anti-malware products in order to meet all that is expected of them,
and not just in terms of efficacy.
The old way to look at things focused on quantity. A big part of this emphasis
comes from magazines, which fund a lot of the testing. AV-Test specialized in
this approach, and could test hundreds of thousands of samples against many
products. There is a newer imperative to quantity too: The number of different
malware samples in the wild has skyrocketed in recent years. The corresponding
number of updates from anti-malware vendors has also skyrocketed.
But the highly automated nature of this testing can compromise some
important objectives. For instance, using virtual machines can greatly speed up
mass testing, but some malware actually checks to see if it's running in a
virtualized environment and changes its behavior, perhaps presuming that in
such a case it is being run by a researcher. The ultimate, most accurate
environment in which to test an anti-malware product is, unsurprisingly, the
sort of normal, Internet-connected PC on which the product and malware are
designed to run.
Anti-malware vendors have told me that the future of this testing has to be
on an online PC as opposed to an isolated network (or a non-networked PC as is
done with WildList testing). Many of these products check back with their own
sites in the normal course of operation, not just for signature updates, but
for Web site blacklists and whitelists.
Another big problem, and a major focus of the "Dynamic Detection"
paper, is heuristic detection. I've been involved in attempts to benchmark
heuristic detection, and let me assure you it's a very difficult problem. It
requires sophisticated lab procedures and lots of preparation time. How do you
know, for example, if a sample is blocked, that it was blocked for behavioral
reasons and not for matching a signature? AV-Test does this by freezing product
installations for a time, blocking them from updating themselves. The downsides
to this are that you are testing what could be an old version and that
heuristic detection of the samples you get at that point in time may not match
up well against next week's samples. What can you do? (That was a rhetorical
question.)
But the real problem is that when you test on a real PC in a real
environment and you check the things you want to check, such as whether a
sample was blocked by a simple signature or by a behavior check or by a URL
filter or some other mechanism, you can't automate everything the way you can
with a simple file scanning test. Marx tells me that "we cannot test more
than about 50 samples in two days, with two persons in place" even though
they have written numerous tools to assist the processes. It's not worth
running tests at this speed unless someone is paying for them, and, for now,
those who pay would rather see massive numbers of samples superficially scanned
rather than a much smaller number of real threats tested in a realistic
environment.
These are classic testing tradeoffs I recognize from testing many different
kinds of products. Security software is about a whole lot more than scanning
files now, and the old style of testing doesn't help users to determine which
products are best. We know what to do, but I think inertia is stronger than
logic in this market.
Security Center
Editor Larry Seltzer
has worked in and written about the computer industry since 1983.
For insights on security coverage around the Web, take
a look at eWEEK.com Security Center Editor Larry Seltzer's blog Cheap Hack.