Many years ago in the course of testing antivirus software for PC Magazine, one of the vendors I spoke to said that their long-term radar indicated that conventional antivirus pattern scanning techniques were headed for a technological wall. The number of viruses that the product searched for was projected to grow by a third in the next year. Within a few years, scanning would simply take too long.
Other experienced antivirus pros tell me they have heard this sort of thing before, and quite a long time ago. Back in the early days, 500 viruses was supposed to be the practical limit, then 1000, and so on. Do these projections belong with others predicting IP address shortages and nuclear meltdowns on 1/1/2000? The answer is a definite “probably.”
The argument against pattern-based scanning in the long term is an argument for heuristic scanning. Almost all antivirus scanning checks the contents of files and other content against a list of patterns, or definitions, supplied and kept up to date by the vendor. The technique involves simply comparing the contents, which can be done in any number of ways. Without getting into a dissertation comparing pattern-matching algorithms, suffice it to say that we know how to do this with absolute precision, and the only question is how to do it the fastest and least resource-intensive way.
Heuristics, on the other hand, attempt to do things that we dont all necessarily agree how to do. The idea of heuristic scanning is to look at a section of code and determine what it is doing, then to decide whether the behavior exhibited by the code is viral or otherwise malicious. This is not an easy decision to make. It involves modeling the behavior of code and comparing that abstract model to a rule set. This has to take more time and be more resource-intensive than pattern matching. Of course, the advantage of heuristics, at least of a theoretical efficient and accurate heuristic scanner, is that it can detect viruses that havent been written yet, and the problem of distribution of definitions goes away.
If youre a vendor selling that theoretical efficient and accurate heuristic scanner, please send it to me for a review. I havent seen one in action yet. In fact, Im skeptical of heuristic scanning partly because its next to impossible to test heuristic scanners in commercial antivirus products. Currently, you cant tell an antivirus product to scan only with heuristics — so you can only test them effectively if youre the vendor with access to the source code. Even then, you only have access to one product. I suspect nobody has ever done an effective comparison of heuristic scanning engines.
Continued on Next Page
Is AntiVirus Technology Headed
I was involved in the closest thing, to my knowledge, to such a comparison. We settled on this as a technique: We installed all the products on identical systems on the same day, updated their definitions, and then saved the system images with Symantec Ghost for later. A month or two later we restored the images to test systems which were disconnected from the Internet, and used them to scan new viruses that had come out since the initial product installation. If the scanners found any of them they would have to do so through heuristics. The results were awful, and several of the products raised no red flags at all for the 8 viruses we tested. The best performer was McAfee, which raised suspicions on two of the files.
Of course, heuristics are one of those things that people say will get better in the future because of “research.” I would argue that the future bodes better for brute-force pattern matching than it does for “intelligent” heuristic scanning because of hardware trends. In the last few years, hardware has gotten drastically faster yet desktop software has not grown in complexity sufficiently to consume the extra CPU power. And were only on the doorstep of the 64-bit, super-parallelism era of CPUs, one which should improve pattern matching even more, far more than they will benefit heuristics.
My copy of Norton Antivirus says it protects against 64,201 viruses (6/25/03 definitions). Is there really a problem with protecting against many times that? Last week I argued that antivirus products should also be scanning for the stuff we call spyware and fight, for some reason, with other programs. My copy of Spybot Search and Destroy says it protects against 7454 “problems” although many of them are cookies. I have a spam filtering program that separately scans the same e-mail my antivirus scans. It would actually be more efficient for one program to do all this work.
As if I needed more convincing to have no confidence in heuristic scanning, today I got a fishy email with an attachment. It looked like a virus, except that the attachment made it through my antivirus scanner. Scratch that – it made it through 3 different antivirus scanners, the Norton on my desktop, the Panda at my gateway, and a third one at my ISP (I dont know who the vendor is for that one). Turns out its the new Sobig.E (see iDEFENSEs description of Sobig.E for more details) which is in a new pattern file that came out from Symantec hours later, but I got 5 of these emails before the cavalry came over the hill.
This episode shows the real problem with the system of virus definitions: The Internet has allowed attacks to spread so fast that they can outpace the ability of the definition development and distribution system to keep conscientious users up to date. This is a real concern. Of course, this episode also had my heuristic scanners failing me one more time.
Security Supersite Editor Larry Seltzer has worked in and written about the computer industry since 1983.