Computer product testing, sadly, has been as much art as science over the years. It’s not just that the products are so complicated as to defy simple, straightforward analysis, but also there are no general agreements on how products should be tested. Now that may be changing with respect to the testing of anti-malware products.
New guidelines issued by AMTSO (Anti-Malware Testing Standards Organization) set an excellent standard for high-quality testing that you can believe in. I was in the professional testing business for many years, at least 13 or 14, and was technical director at four different labs. I don’t do much actual testing of products anymore, but I still follow testing issues carefully. I’m really impressed with what I’m reading in these standards.
Two “Principles” documents were released by AMTSO. The first, “AMTSO Fundamental Principles of Testing,” is a set of rules and advice, mostly for testers. The nine principles:
- Testing must not endanger the public.
- Testing must be unbiased.
- Testing should be reasonably open and transparent.
- The effectiveness and performance of anti-malware products must be measured in a balanced way.
- Testers must take reasonable care to validate whether test samples or test cases have been accurately classified as malicious, innocent or invalid.
- Testing methodology must be consistent with the testing purpose.
- The conclusions of a test must be based on the test results.
- Test results should be statistically valid.
- Vendors, testers and publishers must have an active contact point for testing-related correspondence.
Some of these are more obvious than others, but the elaboration of the principles that follows makes clear they aren’t just lip service. With respect to No. 1, I’ve been involved with malware tests, especially for the ability to detect unknown malware, where we have discussed creating new malware purely for the test. The guidelines specifically forbid this, although it does allow the modification of existing malware characteristics. This principle also speaks about taking precautions to prevent malware from escaping the lab.
Examining AMTSOs Principles
Principle No. 2, about bias, is in many ways the most impressive. It recognizes that tests may often be performed on contract for a vendor and that such testing can still be unbiased. It sets rules for disclosure of such relationships and potential conflicts:
“[T]his disclosure should include any relationship that could potentially influence the tester, including: (i) whether the publication or tester has received revenue from a vendor or affiliate with regard to any particular test, and (ii) whether the publication or tester receives a significant portion of its overall revenue from a particular vendor.“
I’ve always believed that this is the right way to do things and that people are able to consider such conflicts in judging the data.
No. 3 recognizes that labs may not want to release all details of methodology, but it requires certain important points to be published with results, such as configuration settings for products tested, full configurations of test systems and how the samples were obtained. But it doesn’t require, for example, source code for test harness systems.
No. 4 is about looking at a variety of factors in drawing conclusions on a product. So a review that talks about detection percentage without discussing false positives, for example, might run afoul of this principle. I have to say this is one of the more subjective of the principles.
No. 5 says that you shouldn’t just trust the judgments of the products being tested. You should confirm that samples detected as malware are in fact malware, and not false positives, or vice versa. This can be difficult.
No. 6, the consistency principle, refers to using the right products for the audience, so you shouldn’t, for example, test corporate gateway products for a consumer audience, or consumer solutions for an enterprise audience.
No. 7 should be obvious: Don’t assert conclusions which disagree with your data. When this happens it’s usually a sign that the author is disappointed with the data for not showing what he wanted it to show. It helps to have a rule.
No. 8, about statistical significance, is one that could do with some more specifics, but perhaps that wasn’t possible at this point. If we should demand statistical significance in the results, we should provide a numerical standard for it. Still, there are plenty of tests (I’ll admit it, I’ve been involved in some) where we didn’t have enough samples and went ahead anyway. Once again, it helps to have a rule.
No. 9, the active contact point rule, is a really good one for all involved. Perhaps it could have gone on to give a minimum response window.
Dynamic Testing Issues
There is a second AMTSO document: “Best Practices for Dynamic Testing.” Most high-volume testing of malware is run through automated systems where files are copied from network shares to the test system. It’s not the way users run their own computers.
“Dynamic Testing” aims to reproduce, in every meaningful way, the actual user environment for which the product was designed. This has become more necessary over time as anti-malware products increasingly include features, such as very frequent updates, which do not function properly in a classic lab environment.
The paper recognizes that testing like this is extremely difficult. Often, even when done fairly, it’s impossible to reproduce results consistently. But it encourages testers to do what they can to make circumstances consistent and fair.
Here’s a good example of a problem that such testing encounters: PC users will be open to the Internet; should the test systems be? What if malware escapes from the test system, violating Principle 1 above? The document recognizes several approaches that can be valid, including building a fake Internet, known amusingly as a “Truman box.” Whatever method you use, the important thing is to discuss what you did and the effects of it.
Use of virtual machines is a big issue in dynamic testing. Spawning off a new VM for testing such products makes the testing far easier, but the environment is not the same as the typical PC user’s. More and more malware is becoming aware of VM environments and using that information to change behavior, probably under the assumption that VMs indicate a tester. Because of this, as tempting as VMs are, AMTSO recommends real machines for dynamic testing, and that members share tools to facilitate such testing.
Talk about standards groups usually evokes an academic image, but some of the best standards have come out of industry consortiums. AMTSO membership is largely composed of vendors, and they recognize that they have an interest in good testing.
Don’t expect that you’ll start seeing results compliant with these guidelines a lot. Testing like this is difficult and expensive and few labs are set up to do it. If all goes well, more will be from now on.
Security CenterEditor Larry Seltzer has worked in and written about the computer industry since 1983.
For insights on security coverage around the Web, take a look at eWEEK.com Security Center Editor Larry Seltzer’s blog Cheap Hack.