One basic mode of virus detection today is still signature scanning, similar to finding the offending bytes in the older .COM files, but things are far more sophisticated now. A signature file, or Dat file as called by some vendors, is a database of uniquely identifiable "fingerprints" that a virus contains. The fingerprint for an executable virus typically is a series of machine code bytesaka "strings" that a virus contains, and such strings are the fruit of the researchers labors.
Scanning is done either on-demand or on-the-fly (as a file or email is accessed), and uses essentially the same techniques. On demand scanning is what most people envision as an antivirus-- you click on an icon, or launch a program that scans a target file, folder or whole drive. On-the-fly scanning is done when you execute a program, receive an email, or copy a file.
Today there are over 60,000 known viruses, Trojans, worms, and variations.
In the early days of antivirus software, the number of viruses fingerprinted numbered in the hundreds. Scanning a file looking for all known viruses was fairly quick. Now, there are over 60,000 known viruses, Trojans, worms, and variations. Antivirus vendors not only struggle to identify and detect malicious code, but have to keep scanning performance within acceptable limits.
Several techniques are used to keep a handle on performance. First, signatures are classified by the type of infection they represent-- boot sector, .COM file, .EXE file, scripts, or macros. Through a process of elimination, when a particular file is scanned, only the signatures that pertain to that file type is used to keep scan times down. For example, a boot sector signature would not be used to scan a macro file.
Next, certain rules are applied to keep the scanner from having to trudge through a complete file looking for infection. This classifies as secret sauce--- details are sketchy and each company has their own ways. Patrick Hinosia, CTO of Panda Software mentioned they have developed an antivirus language that their products use to define how files are scanned. Peter Szor, Chief Researcher of Symantec Security Response, told us they use a Java like P-code system to drive their scanners. Depending on the type of file-- .com, .exe, or .doc -- the scanner knows to go to areas in the file that are more likely to contain a virus. For example, in a simple .com, the scanner will look to the end of the file, as it is the most commonly infected area. Alternately, a Word 97 DOC file has a specific area where macros are stored that the scanner can directly evaluate.