Ask a handful of technology managers what puts the “grid” in grid computing, and youll get a handful of answers. Corporate enterprises are beginning to adopt grid computing to help solve compute-intensive problems related to such topics as crash simulations, financial modeling or creating new algorithms. Likewise, research institutions, many facilitated by the San Diego Supercomputer Center, have been using grids to explore the galaxy, predict earthquakes and help cure diseases.
Grids can be as simple as a server cluster at a single company or as massive as the globally distributed grid systems that SDSC has built. It all depends on what the grid is for. So, whats a grid?
“The way I look at grids is what their benefit is: allowing folks to access remote resources that they normally wouldnt be able to,” said Philip Papadopoulos, program director for grid and cluster computing at SDSC, based at the University of California at San Diego.
“The easiest way to think about it is, you have a computational platform in one place and you have data thats sitting in another place. It may be across campus, or it may be across the country. And you need to be able to put those two things together,” Papadopoulos said.
The more data thats used, the larger the grid will be and the more geographically dispersed, but in the world of grid, distance really doesnt matter, said SDSC Director Francine Berman.
“Say you want to get from Washington to San Diego,” said Berman, who has performed academic research on grids for the past two decades in the areas of programming environments, adaptive middleware, scheduling and performance prediction. “Youre going to use some combination of your own legs, your car, taxicabs, shuttle buses, airplanes, etc., and all of those have to be functional, not just in and of themselves, but you have to coordinate them together so you can get from point A to point B. The whole process of coordinating technology resources by integrated software is the idea behind grid computing.”
Although they are many times larger than what a single company could put together on its own, the SDSC grids, by Papadopoulos and Bermans definitions, could serve as prototypes for any company looking to tap into virtually limitless computing power.
The largest of the grids is the TeraGrid project, consisting of several Intel Corp. Itanium-based clusters, said Papadopoulos. As director of SDSCs Advanced Cyberinfrastructure Laboratory, Papadopoulos is involved in building networking, managing open-source development projects and overseeing management of the grid. “[The TeraGrid includes] a large IBM Power 4-based machine, a 10-teraflops machine. Connected into our grid infrastructure is 500TB of disk space. We can house large data sets for the [scientific] community and for direct access from our large machines. People can get to that data either through our large machines or over the network,” he said.
The IBM computer Papadopoulos referred to is an eServer Blue Gene system with 1,024 compute nodes and 128 I/O nodes. Each node consists of two PowerPC processors running at 700MHz and sharing 512MB of memory. The systems top speed is 7.7 teraflops.
To facilitate the work of its scientists and researchers, SDSC runs about 800 computers on its machine room floor, Papadopoulos said. All have software that can tie them to a grid.
“We have several clusters that are national-scale research facilities but are more tuned to individuals,” Papadopoulos said. “We have 128-node and 200-node clusters [that are] very common; 50-node clusters are also common.”
In May, SDSC increased its online disk storage to 1.1 petabytes. This is in addition to the more than 6 petabytes of tape storage and 4.2 terabytes of IBM DataStar supercomputer memory that SDSC already has. A petabyte of stored data is the equivalent of text information that would fill the Library of Congress more than eight times over, SDSC officials said.
In the Beginning
The history of grid computing at SDSC can be traced to 1985, when the National Science Foundation decided to start a supercomputing centers program to make supercomputers available to academic researchers.
“Before then, if you wanted to use supercomputers, you really needed to be in defense or in the Department of Energy and doing classified work,” Papadopoulos said. “That kind of computing power was not available to the masses.”
That program existed for 12 years, and in 1997 a new program—the Partnership for Advanced Computational Infrastructure—was started, aided by the growth of broadband, Papadopoulos said. “Part of the reason it started was that networks went from [56K-bps] networks that interconnected the centers in 1985 to, in 1994, the BBNS [BroadBand Networking Services] at 45M bps. In 1997, the centers were connected at 155M bps. It was enough of a change for a new program to be started. The supercomputer centers no longer had to act as islands.”
By 2001, network throughput increased from 155M bps to 655M bps. SDSCs TeraGrid project was then introduced with a 40G-bit back-plane network.
Today, all SDSC research is funded by grants and awards. However, in its early days, the center was allowed to resell about 10 percent of its spare cycles in an effort to raise more funds for research.
Next Page: Scaling the TeraGrid
Scaling the TeraGrid
The TeraGrid, a multiyear effort to build and deploy the worlds largest, most comprehensive distributed infrastructure for open scientific research, was deployed in conjunction with SDSC commercial partners IBM, Intel and Qwest Communications International Inc. Other corporate partners included Myricom Inc., Sun Microsystems Inc. and Oracle Corp.
In addition to SDSC, TeraGrid users include Argonne National Laboratory; the Center for Advanced Computing Research at the California Institute of Technology; the National Center for Supercomputing Applications at the University of Illinois, Urbana-Champaign; Oak Ridge National Laboratory; the Pittsburgh Supercomputing Center; and the Texas Advanced Computing Center at the University of Texas at Austin.
To build the TeraGrid, IBM Global Services deployed clusters of eServer Linux systems at the initial sites of what was then known as the Distributed Terascale Facility—at SDSC, Caltech, NCSA and Argonne—in the third quarter of 2002. The servers featured Intels Itanium processors, grid-enabling middleware and Myricoms Myrinet interconnect for enabling interprocessor communication.
The system can store more than 600TB of data, or the equivalent of 146 million full-length novels. A substantial portion of the grids storage infrastructure will be enabled by IBM TotalStorage products and technologies.
The Linux clusters are linked via a 40G-bps Qwest network, creating a single computing system able to process 13.6 teraflops. A teraflop is a trillion floating-point calculations per second, and the TeraGrid system is more than a thousand times faster than IBMs Deep Blue supercomputer, which defeated chess champion Garry Kasparov in 1997, IBM officials said.
To date, the National Science Foundation has spent about $100 million on the TeraGrid, SDSC officials said. But SDSCs contribution is not insignificant: The center leads the TeraGrid data and knowledge management effort by deploying its data-intensive IBM Linux cluster to the grid. In addition, a portion of SDSCs 10-teraflops DataStar supercomputer is assigned to the TeraGrid. And tape archives that support IBMs HPSS (High Performance Storage System) and Suns Sun SAM-QFS have a storage capacity of 6 petabytes and currently store 1 petabyte of data. A next-generation Sun high-end server—the Sun Enterprise E15K—helps provide data services, SDSC officials said.
The all-important middleware that makes the grid environment possible includes Globus Toolkit, which provides single sign-on for users through Grid Security Infrastructure, Papadopoulos said. Also in use is SRB (Storage Resource Broker) software. “It allows you to look at storage that is physically distributed as if it were in one place,” Papadopoulos said.
As grid is still an emerging area, tools and middleware have not always been readily available, so researchers have had to build their own, Berman said. This led to open-source efforts to provide tool sets to help developers build grid-enabled software. SDSC took the lead in developing the SRB technology as well as the Rocks clustering tool kit.
Rocks allows people to build clusters easily, Papadopoulos said. “Its an open-source project, and its motivation is to help scientists in university labs deploy scalable computing with no more thought than deploying a workstation,” he said.
Next Page: Putting the Grid to Work
Putting the Grid to
Work”>
With the grid in place, its up to developers and researchers to put it to work.
“Innovative ideas are most easily implemented as middleware, as in Globus,” said Phil Andrews, program director for high-end computing at SDSC. “If they become successful, they naturally migrate down toward infrastructure, where their reliability, transparency for the users and efficiency are all optimized. Those which dont make the journey will disappear, while the successful ones become standard pieces of the computational environment.
“Rather than trying to hang on to them, continually adding features, the developers need to wish them bon voyage and turn to developing the next innovative ideas. One facility were trying to harden at SDSC, in combination with IBM, is the Global File System approach, with their GPFS [General Parallel File System] as the present implementation,” Andrews said.
“While much of the grid technology thought, science and people are coming out of the university environment, the demands of grid infrastructures in universities and in commercial enterprises are very different,” said Larry Tabb, president of The Tabb Group, a Westboro, Mass., market research company.
“University grids tend to facilitate the sharing of compute infrastructures between and among many universities, while commercial grids tend to leverage the infrastructure within the firm and do not connect to the outside,” Tabb said.
“There are also much more sophisticated needs for commercial grids that are trying to solve problems that require a real- or near-real-time response, while university grids are much less interested in real time and much more interested in harnessing massive quantities of compute cycles,” he said.
Grid technologys immediate future includes the infusion of Web services. “I think the evolution of grid software will be fairly rapid,” Papadopoulos said. “From a technical point of view, the main grid infrastructure is just making this transition to a Web services-based infrastructure, which means that theres Web services plus some other things.”
In addition, the hardware infrastructure for grids will continue to grow, boosting performance and computing potential, Papadopoulos said.
Berman said the grid community has come a long way.
“There are real challenges in terms of security, in terms of programming environments for grids [and] in terms of policy when you cross different administrative domains or national boundaries,” she said. “In terms of modeling the performance of grid environments, you have to model networks and computers and data storage and the kind of dynamic interaction they have. We still have a ways to go, but weve come a long way with grid computing.”
Check out eWEEK.coms for the latest utility computing news, reviews and analysis.