Grid Technical Challenges Daunting

Promises performance and cost savings-but, oh, the details...

It was with a sense of déjà vu that we examined this latest new thing, grid computing.

An aging player still hoping to jump from the minors to the majors, grid computing is a new name on an idea thats been researched by computer scientists since at least the 1970s.

While grid computing is, at its heart, distributed computing by another name, its a new concept in one important way: an ambition to tackle intraorganizational or even planetwide grids (rather than primarily departmentwide or campuswide projects). This growth in scope makes broad hardware support and shared communications standards even more critical than before.

There are also many parallels between historical distributed computing research results and grid computing.

The two most important lessons right now for early adopters are to stick with specialized computing tasks (see chart for the kinds of jobs that work well with grid computing) and to expect a maturation period of several years before this technology is more broadly applicable.

Making grid computing work for a large class of computer problems is a difficult challenge thats been researched for decades past—and will be for decades to come.

Moores Law is what makes the idea of grid computing steadily more tantalizing. The overall processing power of entry-level CPUs continues to double and double again while prices stay constant, and given typical corporate usage patterns, most of these CPUs are idle most of the time. Moreover, there are more systems running on organizational networks than ever before. Gartner Inc. estimates that 113 million PCs were sold in 2000, adding to the already hundreds of millions of systems deployed.

Finally, these systems are interconnected by ever faster and more reliable networks.

Being able to take better advantage of the hardware and network assets an organization has is a compelling return-on-investment argument to make to a board of directors.

While grid computing isnt necessarily cheap—Sun Microsystems Inc.s Sun Grid Engine Enterprise Edition starts at $20,000 for deployments of up to 50 CPUs, for example, and hidden costs such as power and cooling also go up—it can still be very cost-effective.

Actual deployment data is now emerging that shows grid computing can provide big performance gains for applications other than scientific computing and image rendering.

Aerospace company Pratt & Whitney, a division of United Technologies Corp., uses Platform Computing Inc.s Platform LSF grid software to handle computer-aided simulations of space propulsion systems and aircraft engines during design and development. The East Hartford, Conn., company also uses the software to allocate resources across workstations when employees run desktop applications that require additional processing power.

"Using grid technologies lowers our development costs significantly because of the ability to harness idle power to get jobs done significantly faster," said Peter Bradley, associate fellow for high-intensity computing at Pratt & Whitney. "Weve been doing grid computing for so long that its baked into our business plan. We couldnt live without it at this point."

Despite the lure of taking advantage of spare CPU and network resources, major hurdles lie ahead for grid computing. The two most important ones are the development of programming standards (specifically, standard APIs to grid-enable applications) and interoperability standards (standard grid communication and management protocols, so different grid implementations can connect).

Although a few distributed systems try to do clustering in a way invisible to standard applications, most grid computing packages require applications to be rewritten to use vendor-specific APIs.

There is some good news on this front, as the Globus Projects Globus Toolkit has emerged as the leading grid computing tool set. (See interview with Globus Project co-leader Ian Foster, Page 37.) The project is backed by IBM, Compaq Computer Corp. (now part of Hewlett-Packard Co.), Sun and Microsoft Corp., as well as supercomputer players such as Hitachi Ltd., NEC Corp., Fujitsu Ltd., Cray Inc. and Silicon Graphics Inc.

Globus Toolkit 2.0 shipped in November, and the first commercially supported version has already been released by Platform Computing (several others are in development by project backers). This effort should produce interoperable grid software from a variety of vendors.

The Globus Project is now developing its next-generation standard, OGSA (Open Grid Services Architecture), which will form the basis of Globus Toolkit 3.0, expected next year.

OGSA incorporates XML data transfer and emerging Web services standards into grid computing, something that should give grid computing and Globus Toolkit a boost.

The first technology preview release was posted May 17 and can be downloaded via The server is written in Java, but the download includes a client written in Microsofts C# language to demonstrate client-side interoperability.

Besides standard protocols, secure communication, strong authentication, shared data formats (XML will play a major role here), resource governance, usage costing and chargeback options, failure handling, and distributed administration are all important to grid computings future.

However, grid computing has already provided itself a creative way to do more with less, and if companies can do so without heavy redevelopment costs, and in ways their key technology suppliers support, grid computing has an important role ahead.

West Coast Technical Director Timothy Dyck can be reached at