Tux: Built for Speed
In most cases, a public benchmark is really nothing more than a transaction race where bragging rights and platform pride are the prizes. But every so often, a revolutionary performance breakthrough comes to the forefront during a test.
In the case of eWeek Labs Web server benchmark, Red Hat Inc.s Tux 2.0 Web server running on a Linux 2.4 kernel has taken performance far beyond what was previously possible and blazes the way for future Web servers built on the same architecture.
In a test performed at eWeek Labs, we, working closely with Dell Computer Corp.s Performance Engineering group (the original group that first published Tuxs amazing performance benchmarks on the SPECWeb 99 benchmark), found that Tux was able to perform nearly three times faster than current Web server mainstay Apache (12,792 transactions per second vs. 4,602 tps) when running a mix of dynamic and static Web content.
The 60.7MB of static Web content was small enough to easily fit into RAM, so this benchmark primarily tested networking and thread management code, not disk-handling routines (we did have Web server logging enabled, however).
Revising server speeds
Tuxs amazing speeds, even on low-end hardware, strongly validate its unusual design: First, Tux puts Web server code into the kernel and reads Web pages directly from Linuxs kernel-mode file system cache for speed; second, Tux handles high numbers of connections very efficiently by using a small pool of worker threads instead of using one worker process per connection (as Apache does); third, Tux uses its own high-performance thread-scheduling algorithm to minimize the impact of disk activity.
Tux is also very easy to deploy incrementally across an enterprise because it can transparently forward Web requests it cannot handle to another Web server, such as Apache. Tuxs main weakness is that it doesnt support Secure Sockets Layer traffic, a feature planned for a future version.
The fact that Tux 2.0 was also significantly faster than Windows 2000s Internet Information Server 5.0 Web server (5,137 requests per second) clearly shows the advantages of Tuxs new design over that of a well-established Web server. The next version of IIS (which ships with Microsoft Corp.s Whistler project) uses several ideas introduced by Tux, including the kernel-space design.
IBMs AIX has included a kernel-space Web cache (although not a kernel-space Web server) since 1999, so this in-kernel trend is starting to sweep across the industry.
In terms of system implementation, the explosive performance of Tux and Tux-like Web servers should allow IT managers to build faster and more scalable Web server farms using fewer servers and processors, which in turn should free up corporate resources to buy bigger and better application and database servers.
2.4 kernel outsprints 2.2
In this test, we also wanted to help quantify the many scalability and performance changes in the Linux 2.4 kernel. Its very clear from our results that Linux 2.4, whether running Tux or Apache, is a far faster platform than Linux 2.2 was.
As mentioned, Tuxs internal architecture is designed specifically for high performance, but that design is only one of five factors critical to its top-notch performance, according to Tuxs primary author, Ingo Molnar, kernel development/systems engineer at Red Hat, in Berlin.
The other four areas are all features of the Linux 2.4 kernel and will speed up any Linux server application, not just Tux: zero-copy TCP/IP networking, interrupt and process CPU affinity, per-CPU kernel memory resources (called slab caches) and wake-one scheduling. (Some features, including zero-copy networking, require server application changes before they can be used.)
Molnar also credits the big development effort to tune the 2.4 kernel for SMP (symmetric multiprocessing) systems. "Getting good SMP scalability in a kernel is a process of many, many smaller steps and careful profiling," he said. "The 2.4 kernels main goal was to achieve these enterprise scalability levels."
Zero-copy networking provides a way for a network card driver to access the data its been asked to send directly from the kernels disk cache or user-space memory buffer. Previously, the kernel had to copy data from the disk cache to a separate network buffer before sending it.
Affinity features associate system objects such as a running process or an interrupt with particular CPUs to get more from each CPUs cache.
Wake-one scheduling is a major change that improves the efficiency of multiprocess server applications.
In the Linux 2.2 case, all processes waiting for external input (such as network traffic) to arrive before continuing to run are woken up when needed input arrives. Because only one process needs to handle the request, the rest, lacking data, immediately go back to sleep. In Linux 2.4, only one process is woken up, saving CPU cycles.