There Aint No Business Like Benchmarking

Good, bad or ugly, benchmarks are worth it

An old Net adage is that the best way to get a quick answer to your question is to post a wrong answer to it. Nothing draws the lurkers out like the chance to correct someone elses mistake.

Posting benchmark results online are a prima facie example—on first glance, most people who feel that their platform investment or judgment decisions are being threatened can find something obvious that was missed.

That was never more evident than with a J2EE vs. Microsoft .Net Framework benchmark that The Middleware Company (a J2EE training and consulting company) organized. The company also runs, a popular J2EE news portal and discussion site.

The benchmark (documentation, scripts and code can be downloaded from was based on the now infamous Petstore application that Sun originally developed as a teaching tool for J2EE technologies.

Microsoft has used Petstore in the past to do similar comparisons, and has been legitimately criticized for using the Petstore code because the application was not designed for performance (although the .Net version certainly was).

In this test, two J2EE experts from the Middleware Company did the J2EE tuning using two different (unnamed) J2EE application servers on both Windows and Linux to get Petstore up to snuff. They made huge improvements in the code: According to their paper, their version was roughly 17 times faster than the original Sun version when they were finished tuning it. But Windows 2000 and the .Net Framework still came out significantly faster as well as more stable in the test when running these particular versions of this application.

There certainly was a measure of vitriol poured out by posters on this benchmark—a feeling of our own betraying us. However, in more than 130,000 words of postings (warning: that link will take a while to load), a number of good criticisms were also made about how the J2EE application could be made faster.

Besides the carping—Java only comes into its stride on Solaris, or on machines with more than eight CPUs, and so on—there were good criticisms: The .Net and J2EE applications were run with different databases; the J2EE application used bean-managed entity bean persistence instead of container-managed entity bean persistence (many posters thought container management would be faster); and that entity beans should be avoided entirely in favor of a straight JDBC-based approach.

Rickard Öberg, whose Java credentials look pretty strong to me, posted a detailed list of performance improvements that could be made to the Middleware Companys version of Petstore.

These kinds of things are to be expected in any complex benchmark and dont make this effort invalid. I know what its like to spend 100-hour weeks in a benchmark lab tweaking head-aching combinations of code and settings to get the best possible performance out of every tested product.

In my book, any benchmark whose configuration setting and code is published so others can build on the work is a good benchmark.

The Middleware Company is considering a follow-on test incorporating these optimizations. Good for it. I hope it does so, and I dont think theres any problem with credibility here. As long as everything is made public, people can read the specs and learn their own lessons from the data.

The best things about benchmarks arent even the numbers: full disclosure reports are a fascinating education in how to tune, in what little things can cause systems to fail, and in how to write performance-sensitive code. They are also a great source for packaged scripts and a methodology people can use in their own testing. I get a lot of mail from readers thanking us for posting our own benchmark scripts and configurations online.

A larger lesson here is that J2EE vendors are doing their part to bring Petstore pain upon themselves. The main Java application server benchmark, SPECjAppServer2002, former known as ECPerf, cannot be run by non-J2EE vendors because the benchmark is defined using J2EE-specific code. What choice does Microsoft have but to use another application?

If SPECjAppServer2002 were made more generic, or if application server vendors would start posting TPC-W numbers, then benchmark battles would be fought in a more neutral forum. Instead of saber rattling and angry denouncements, lets see some code and lets see some numbers.

Our next eLABorations column will appear on Friday, Dec. 27. West Coast Technical Director Timothy Dyck can be reached at