SGI on Wednesday is announcing that its optimized Linux operating system, Advanced Linux Environment, with the latest version of the companys ProPack 2.4 software suite, can now scale to 256 processors running on a single Linux kernel.
Officials with the Mountain View, Calif., company said they expect to be able to scale the OS to 512 processors by the end of the year.
"The traditional wisdom around Linux, and really over the last year and a half, is that it really only scales to 16 [processors]," said Jason Pettit, product line manager for SGIs Altix 3000 systems.
By improving the Linux kernel by removing some bottlenecks and identifying areas where performance can be improved, SGI engineers have been able to scale the open-source operating system, Pettit said.
Now SGIs Altix systems can scale up to 256 64-bit Itanium 2 processors from Intel Corp. with a single system image of the Linux kernel, he said. Key to that capability is the ProPack 2.4, which among other features includes asynchronous I/O, a serial ATA driver from Vitesse Semiconductor Corp. and the ability to boot partitioned systems without a disk. There are also a new message passing toolkit and science and math libraries.
The systems are targeted at high-performance computing environments and research and development departments in large corporations, Pettit said. Businesses now can reduce their time to market for new products, and researchers can decrease the time needed for discoveries, he said.
Altix systems scaling to 256 processors are available immediately.
Keys to the Altix systems are the NUMAflex shared memory architecture, which enables greater scalability of the systems, and the NUMAlink interconnect fabric. That will enable SGI to offer supercluster configurations with the Altix 3000 servers running from four to 1,024 processors by May.
Beverly Bernard, SGIs Linux product manager, said the company initially expected to grow its Linux OS from 64 to 128 processors by now, with 256-processor scalability coming later in the year. The 64-processor capability was announced in January 2003. However, as testing on the scalability was being done by SGI engineers, "they werent running into any issues. It was running beautifully."
Bob Ciotti, lead for the Terascale Application Group at NASAs Ames Research Center, said running large systems with a single system image, or SSI, is important to obtaining the manageability, reliability and low latency his researchers need to perform the work they do.
"They are a little bit easier to use administratively on a day-to-day basis," said Ciotti, in Moffett Field, Calif. "It makes our life—from an operational aspect of the system—run more efficiently, and programming of the machines a little easier."
Its more manageable to debug a single system rather than have to work on multiple systems tied together in a cluster.
A disadvantage is that if there is a problem within the system, with an SSI the whole server has to be taken down to fix the problem. Given that situation, researchers have to determine what is an acceptable time between downtimes. In Ames case, that is two weeks, and the Altix system has been running at about four weeks between downtimes.
"Its sort of a trade," Ciotti said. "When does it get to the point where we cant tolerate [the amount of downtime]? Were not there yet."
Ames currently is using a prototype 512-processor Altix system for two projects—one for computational fluid dynamics relating to NASAs Shuttle program, and the other for ocean modeling and data simulation work as part of the ECCO, or Estimating the Circulation and Climate of the Ocean program. Ciotti said he envisions the day when systems will have as many as 4,000 processors running on a single or only a few system images.
(Editors Note: This story has been updated since its original posting to include comments from NASAs Bob Ciotti.)
Be sure to add our eWEEK.com Linux news feed to your RSS newsreader or My Yahoo page: