Whats in store
Serving both its own community at the University of California, San Diego, and members of a far-flung networked community of other research sites, the SDSC wears at least two hats as both a research site and a 24-by-7 production facility. As the center takes on new roles, its massive storage systems are becoming correspondingly heterogeneous, and eWEEK Labs thinks enterprise sites can learn from the SDSCs explorations and experience. "We have about 500TB of disk," the SDSCs Moore said. Moore had to raise his voice to speak above the howl of the air conditioning in the vast machine room that houses such behemoths as the DataStar, a massive IBM pSeries installation that this month completed a doubling of its processor count to 2,048 IBM Power4+ CPUs.Data storage density at the center is growing along with processing power: "We just purchased another 500TB of Serial ATA disk, which is much cheaper per byte than Fibre Channel," Moore said. The traditional challenge of supercomputing, to capture and analyze vast data sets resulting from massive simulations, still demands Fibre Channels performance, said Moore. A data library, meanwhile, can go from low-activity archive to high-intensity processing activity in a short time. A natural disaster such as Hurricane Katrina, for example, can create a huge spike in demand for data on meteorological patterns or earthwork failure behaviors, said Anke Kamrath, the SDSCs division director of user services and development. Supercomputer-speed data access will therefore continue to be a critical and growing need. An earthquake simulation, for example, may represent a 40TB or 50TB output set, said Kamrath, adding that snapshots 200TB in size are not out of the question in future studies. And thats what leads to dilemmas of collection versus use. "It takes 200 hours to move that kind of thing across the networkfor disaster response, thats a long time," said Kamrath. To move massive data collections, she added, "you have to tune the FTP parameters to the file sizes. We can push data harder than anyones ever pushed it, but it takes heroic efforts to get anything like the published capability of a connection." Kamrath said she wondered, therefore, if some enterprise sites may be kidding themselves about the actual value of the vast data farms theyve built. "You hear the stories about Wal-Mart storing all their user data; theres hard problems to solve before you can really use all that." Closing that gap is the mission of the SDSCs Natasha Balac, group lead for data applications. Balac is working with researchers who want to host their data collections at the SDSCs National Science Foundation-funded Data Central facility, which opened its digital doorway last month. "A lot of the data is flat files," Balac said, and that creates huge problems of usability. Balac is working with the owners of large data collections to devise better architectures for future efficient use; enterprise sites should likewise be thinking now about the implications of vastly expanding data flows. Next Page: Getting a grip.