Ask a handful of technology managers what puts the "grid" in grid computing, and youll get a handful of answers. Corporate enterprises are beginning to adopt grid computing to help solve compute-intensive problems related to such topics as crash simulations, financial modeling or creating new algorithms. Likewise, research institutions, many facilitated by the San Diego Supercomputer Center, have been using grids to explore the galaxy, predict earthquakes and help cure diseases.
Grids can be as simple as a server cluster at a single company or as massive as the globally distributed grid systems that SDSC has built. It all depends on what the grid is for. So, whats a grid?
"The way I look at grids is what their benefit is: allowing folks to access remote resources that they normally wouldnt be able to," said Philip Papadopoulos, program director for grid and cluster computing at SDSC, based at the University of California at San Diego.
"The easiest way to think about it is, you have a computational platform in one place and you have data thats sitting in another place. It may be across campus, or it may be across the country. And you need to be able to put those two things together," Papadopoulos said.
The more data thats used, the larger the grid will be and the more geographically dispersed, but in the world of grid, distance really doesnt matter, said SDSC Director Francine Berman.
"Say you want to get from Washington to San Diego," said Berman, who has performed academic research on grids for the past two decades in the areas of programming environments, adaptive middleware, scheduling and performance prediction. "Youre going to use some combination of your own legs, your car, taxicabs, shuttle buses, airplanes, etc., and all of those have to be functional, not just in and of themselves, but you have to coordinate them together so you can get from point A to point B. The whole process of coordinating technology resources by integrated software is the idea behind grid computing."
Although they are many times larger than what a single company could put together on its own, the SDSC grids, by Papadopoulos and Bermans definitions, could serve as prototypes for any company looking to tap into virtually limitless computing power.
The largest of the grids is the TeraGrid project, consisting of several Intel Corp. Itanium-based clusters, said Papadopoulos. As director of SDSCs Advanced Cyberinfrastructure Laboratory, Papadopoulos is involved in building networking, managing open-source development projects and overseeing management of the grid. "[The TeraGrid includes] a large IBM Power 4-based machine, a 10-teraflops machine. Connected into our grid infrastructure is 500TB of disk space. We can house large data sets for the [scientific] community and for direct access from our large machines. People can get to that data either through our large machines or over the network," he said.
The IBM computer Papadopoulos referred to is an eServer Blue Gene system with 1,024 compute nodes and 128 I/O nodes. Each node consists of two PowerPC processors running at 700MHz and sharing 512MB of memory. The systems top speed is 7.7 teraflops.
To facilitate the work of its scientists and researchers, SDSC runs about 800 computers on its machine room floor, Papadopoulos said. All have software that can tie them to a grid.
"We have several clusters that are national-scale research facilities but are more tuned to individuals," Papadopoulos said. "We have 128-node and 200-node clusters [that are] very common; 50-node clusters are also common."
In May, SDSC increased its online disk storage to 1.1 petabytes. This is in addition to the more than 6 petabytes of tape storage and 4.2 terabytes of IBM DataStar supercomputer memory that SDSC already has. A petabyte of stored data is the equivalent of text information that would fill the Library of Congress more than eight times over, SDSC officials said.