Scaling the TeraGrid

By Darryl K. Taft  |  Posted 2005-06-27 Print this article Print

The TeraGrid, a multiyear effort to build and deploy the worlds largest, most comprehensive distributed infrastructure for open scientific research, was deployed in conjunction with SDSC commercial partners IBM, Intel and Qwest Communications International Inc. Other corporate partners included Myricom Inc., Sun Microsystems Inc. and Oracle Corp.

In addition to SDSC, TeraGrid users include Argonne National Laboratory; the Center for Advanced Computing Research at the California Institute of Technology; the National Center for Supercomputing Applications at the University of Illinois, Urbana-Champaign; Oak Ridge National Laboratory; the Pittsburgh Supercomputing Center; and the Texas Advanced Computing Center at the University of Texas at Austin.

To build the TeraGrid, IBM Global Services deployed clusters of eServer Linux systems at the initial sites of what was then known as the Distributed Terascale Facility—at SDSC, Caltech, NCSA and Argonne—in the third quarter of 2002. The servers featured Intels Itanium processors, grid-enabling middleware and Myricoms Myrinet interconnect for enabling interprocessor communication.

The system can store more than 600TB of data, or the equivalent of 146 million full-length novels. A substantial portion of the grids storage infrastructure will be enabled by IBM TotalStorage products and technologies.

The Linux clusters are linked via a 40G-bps Qwest network, creating a single computing system able to process 13.6 teraflops. A teraflop is a trillion floating-point calculations per second, and the TeraGrid system is more than a thousand times faster than IBMs Deep Blue supercomputer, which defeated chess champion Garry Kasparov in 1997, IBM officials said.

To date, the National Science Foundation has spent about $100 million on the TeraGrid, SDSC officials said. But SDSCs contribution is not insignificant: The center leads the TeraGrid data and knowledge management effort by deploying its data-intensive IBM Linux cluster to the grid. In addition, a portion of SDSCs 10-teraflops DataStar supercomputer is assigned to the TeraGrid. And tape archives that support IBMs HPSS (High Performance Storage System) and Suns Sun SAM-QFS have a storage capacity of 6 petabytes and currently store 1 petabyte of data. A next-generation Sun high-end server—the Sun Enterprise E15K—helps provide data services, SDSC officials said.

The all-important middleware that makes the grid environment possible includes Globus Toolkit, which provides single sign-on for users through Grid Security Infrastructure, Papadopoulos said. Also in use is SRB (Storage Resource Broker) software. "It allows you to look at storage that is physically distributed as if it were in one place," Papadopoulos said.

As grid is still an emerging area, tools and middleware have not always been readily available, so researchers have had to build their own, Berman said. This led to open-source efforts to provide tool sets to help developers build grid-enabled software. SDSC took the lead in developing the SRB technology as well as the Rocks clustering tool kit.

Rocks allows people to build clusters easily, Papadopoulos said. "Its an open-source project, and its motivation is to help scientists in university labs deploy scalable computing with no more thought than deploying a workstation," he said.

Next Page: Putting the Grid to Work

Darryl K. Taft covers the development tools and developer-related issues beat from his office in Baltimore. He has more than 10 years of experience in the business and is always looking for the next scoop. Taft is a member of the Association for Computing Machinery (ACM) and was named 'one of the most active middleware reporters in the world' by The Middleware Co. He also has his own card in the 'Who's Who in Enterprise Java' deck.

Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters

Rocket Fuel