Two alternative methods of benchmarking supercomputers are nearing completion, as the most popular metric for benchmarking performance is being applied Monday to generate a new list of the worlds most powerful systems.
The latest revision of the Top500 list of supercomputers is expected to be released late Monday before the SC2004 supercomputing show kicks off in Pittsburgh. NASAs SGI-based “Columbia” supercomputer and the University of California/Lawrence Livermore Labs IBM BlueGene/L supercomputer are likely candidates to sit atop the rankings, although NECs recently announced SX-6s 65-teraflop performance could top both of them when NEC administers the Top500 benchmark, known as Linpack.
The trouble with the list, critics say, is that it only tracks results from a single test, something that is usually considered to be an imprecise assessment of a systems performance. To address Linpacks limitations, two other initiatives have surfaced: the HPC Challenge benchmark, which has begun generating a significant sample of results, and ApexMap, whose code is now available for public download. The HPC Challenge data is currently in a “0.7 beta” stage; the final version should be released early in 2005, according to Jack Dongarra, the author of Linpack and one of the coordinators of the HPCC effort.
Other, lesser-known metrics are also available. The Top Application Performers list at Purdue University uses the Standard Performance Evaluation Corporations HPC2002 realistic application benchmark, but the list appears to not have been updated in some time. The Department of Defenses Advanced Research Project Agency (DARPA) Information Processing Technology Office is also involved in the creation of performance metrics, although the agency is working toward developing measurements of supercomputer deployment and execution, according to Bob Graybill, a program manager there.
Above all, the Top500 lists most polarizing feature is its single benchmark. Such a limited measurement of performance has drawn criticism. On the other hand, the list has been circulating since 1986, providing a historical timeline to track performance. Moreover, the code is known and understood.
“Its invaluable,” said David Bailey, chief technologist at Lawrence-Berkeley National Laboratorys Computational Research Laboratory in Berkeley, Calif. “You have so much data with it. You can actually see trends, technology changes in vector computers.”
The Linpack benchmark also serves as a stress test for a supercomputer, as it “exercises everything in the system,” said David Barkai, who works for Intel Corp., of Santa Clara, Calif., as a high-performance computing (HPC) architect.
“The revelation for me was that running Linpack for us was a great diagnostic tool,” said Walt Brooks, chief of the NASA Advanced Supercomputing (NAS) Division in Mountain View, Calif., who oversaw the “Columbia” clusters development. Brooks said he had used the Linpack code to test the Infiniband connections between the nodes, that they had pushed “a little beyond” the limits of the specification.
Brooks said the agency plans to announce soon that it has tied together four of the Columbias SGI servers, each containing 512 Intel Itanium 2 “Madison 9M” processors, into a 2,048-processor cluster that is actually running fluid-dynamic code used to simulate the worlds oceans.
Next Page: Is Linpack a serious test?
A Serious Test
?”>
But Barkai, whose functions include interacting with the HPC community, said the Linpack benchmark isnt considered to be a serious test by researchers, as the benchmark provides a well-organized series of equations to stress the floating-point unit. Real-world tests include integer functions, and often include seemingly random data. “Linpack is clearly—and Im being cynical here—a tool for marketing people who like simple messages: 1, 2, 3,” Barkai said.
Rather, Barkai added, the benchmark is a simpler, more practical tool for funding agencies. “It boils down to a very concise message that can be passed on, that can make a case for funding,” he said.
Even the maintainer of the Top500 list says the benchmarks utility is of limited use. “Linpack only measures a single spot on the high end of the performance scale—thats why companies like it,” said Erich Strohmaier, the maintainer of the list and a computer scientist in the Future Technologies Group of the Computational Research Division at Lawrence-Berkeley National Laboratory in Berkeley, Calif. “Customers have to be very careful when they use it, so that they dont expect the same measure of performance.”
The HPC Challenge benchmarks were created by Jack Dongarra, also the founder of the Linpack benchmark. Dongarra, a distinguished professor at the University of Tennessee and the director of the Center for Information Technology Research and the Innovative Computing Laboratory there, created the HPC Challenge numbers to add several new data points to the mix. The effort is funded by DARPA and the Department of Energy, among other agencies.
Eight tests have been added to the benchmark already, which is scheduled for completion early next year, Dongarra said. “We need to move beyond one measurement,” Dongarra said. “One measurement is one stake in the ground. We need to put in a round collection of things, to test more features in computers, to allow people to select the features that best match their application span.”
Each test is self-administered, according to very specific rules, Dongarra said. Each participant is allowed to submit scores from both an unoptimized “base” model as well as an optimized run, where some limited substitutions of code are allowed. The HPCC team then checks the final code and tests to make sure that the tests were run according to the rules.
However, Dongarra said the accumulated tests will not be compiled into a single number, one that could rate machines a la the Top500 list. The number of benchmarks may remain fixed, although the HPCC organizers say its going to be tough to anticipate all of the new features that will be added to upcoming systems.
Like all benchmarks, the most useful is data. Intel Itanium- and Xeon-based machines dominate the Top500 list, which Intel officials have said is evidence that the architecture scales well and is cheaper than so-called “proprietary” systems using processors designed by Sun Microsystems Inc. and Hewlett-Packard Co. Under the HPCC benchmark, however, machines from Cray Inc. and NEC Inc. play a much larger role. The submitted systems include the Army High Performance Computer Research Centers submission of a 1,024 0.6GHz Alpha 21164 cluster, although most submissions include 256 to 512 processors. Still, most of the tests include some gaps, evidence that the benchmark has evolved over time.
NASAs Brooks said he plans to run the HPCC benchmarks and submit the results, although the tests will require him to squeeze in the benchmarks around his users.
“We anticipate HPCC having a long lifetime,” Dongarra said. “Linpack has had a long lifespan, almost 30 years. … We want to make sure that what were doing now will exist for a long period of time, which is why were going slowly.”
ApexMap emerges
While the HPCC benchmark is more evolved, the Top500s Strohmaier said that the Application Performance Characterization-Memory Access Probe (APEX-MAP) test is set to play a larger role. The single test measures memory access performance—both random and regular access—most often the limiting factor in supercomputer clusters as each node tries to access memory across the shared interface. The test was developed by Strohmaier and Hongzhang Shan, also a researcher at the LLBL.
“Were trying to emulate the performance of different applications,” Strohmaier said. “Its been a research project, but its pretty much at the stage that we want to put it out there.” The two researchers have begun to collect results and post them on the projects Web page.
But scientists said that while industry-standard benchmarks are useful tools, each agency uses its own collection of diagnostics that it uses to evaluate vendors. If the HPCC and other benchmarks become more widely accepted, perhaps the agencies will reduce, but not eliminate, their own tests, Bailey said. In addition to other tests, NASAs Brooks said his agency uses a parallelization routine based on actual computational fluid dynamics research, or “ocean code.”
The real test, of course, is whether the industry will adopt the new benchmarks.
“The big question is that weve called the party—now can we get anyone to come?” LLBLs Bailey said. “If the computer vendors see flaws in the benchmark, or if its too much work, the whole HPC benchmarking activity may just die right now.”
Check out eWEEK.coms for the latest news, views and analysis on servers, switches and networking protocols for the enterprise and small businesses.