Handicapping the Hardware

It sounds like a plot synopsis for an unconvincing TV movie, rather than a real news story, but Hewlett-Packard claims that a former employee tampered with hard disks and wiring in a Superdome server prior to benchmark tests-lowering scores and there

It sounds like a plot synopsis for an unconvincing TV movie, rather than a real news story, but Hewlett-Packard claims that a former employee tampered with hard disks and wiring in a Superdome server prior to benchmark tests—lowering scores and therefore possibly harming sales of HPs high-end Unix hardware.

For all the talk of computers being binary beasts, either on or off—working, or not working—they can be almost like racehorses in their variable performance under only slightly different conditions.

My favorite book on benchmark design is Richard Gabriels "Performance and Evaluation of LISP Systems," which has helped me bring a wide range of hardware to its knees. Even if you never use LISP for any other purpose, its a dandy benchmarking tool, with its ease of writing programs that place tremendous stress on processor and memory resources.

The most important thing about Gabriels book is not its extensive source code, though, but rather its discussion of which aspects of machine design affect test results. These discussions force the question of why were testing, not just what and how.

For example: Does a cache hide a slow subsystem interface, and should benchmarks deliberately frustrate cache algorithms? Or does cache design reflect realistic workload and provide a cost-effective balance of price and performance? Either point of view has its merits, and performance tests are not neutral on this subject.

More generally, where does IT product design or tuning cross the line from legitimate optimization to benchmark cheating? Do standardized benchmarks perversely encourage optimizations that actually harm everyday performance, away from the specific task parameters that those benchmarks employ?

Benchmarking needs a statistical approach, not a reliance on single numbers; it should locate thresholds of task difficulty that may mark abrupt performance changes; it should begin from the questions "What do we need to do?" and "How much is it worth?"—rather than the tempting but often irrelevant question "How fast will it go?"

Tell me what youd like to measure at peter_coffee@ziffdavis.com.