Grid Computing in the Enterprise

By Peter Coffee  |  Posted 2004-02-09

Grid Computing in the Enterprise

Grid computing is an overnight success that has been almost four decades in the making.

Last months announcement of the WS-Resource framework, enabling grid resource management with standard Web services protocols, completes a convergence that began with the 1965 introduction of the first multiprocessor computer. Libraries full of bleeding-edge research have since paved grids way, developing parallel processing schemes to solve exotic and high-value problems.

Todays confluence of commodity components, burgeoning bandwidth and open-source systems software fills in the rest of the picture. Taken together, they make the enterprise case for grid computing, which is the connection of heterogeneous computing nodes using self-administering software that makes the nodes function as a single virtual system.

The last few years of eWEEK Labs reviews have tracked a steady course for key component technologies toward their present state of being ready for prime time. Grids have become a compellingly cost-effective means of delivering almost any conceivable combination of massively parallel computing capability, incremental application scalability and enterprise-class fault tolerance.

But grids now face a barrier of perception that is perhaps even more challenging than previous barriers of technology, with many mainstream enterprise professionals doubting grids applicability to their everyday tasks.

"Unless you have specialized needs like special-effects computing, heavy-duty financial market analysis, weather prediction or any other previously supercomputer-intensive field, I cant imagine that you would need or want the complexity of grid computing," said Jorge Abellas-Martin, an eWEEK Corporate Partner and CIO of the advertising agency Arnold Worldwide.

Plummeting prices of multiprocessor servers make Abellas-Martin and others look askance at the cost of connecting and coordinating separate machines. They wonder if that cost exceeds the savings they can realistically hope to achieve by improving their processor-utilization ratios through a grids adaptive allocation of workload.

However, what grids offer to these rightly skeptical enterprise users is an ease of letting compute power flow to wherever its needed, instead of being statically allocated by the capital spending of particular business units. The enterprise data center is well on its way to becoming a supplier of service rather than a custodian of hardware, as eWEEK Labs will explore next week in Part 2 of this special report. Grids are a key enabling technology, and their foundations are the subject of this weeks stories.

Abellas-Martin and others correctly note that certain types of problems have become closely associated with massively parallel machines. These types of problems are "embarrassingly parallel," in the words of Shahin Khan, Sun Microsystems Inc.s vice president of high-performance and technical computing, because theyre so obviously parallel in nature that "its embarrassing if you cant figure out how to do it."

Life sciences problems such as protein folding are the current frontier of compute-intensive efforts and are using multiprocessing power in enormous quantities with the aid of largely self-deploying multiprocessing support packages such as the San Diego Supercomputing Centers Linux-based Rocks.

eWEEK Labs met late last year with the team that assembled a world-class Rocks-based supercomputer in a matter of hours on a conference exhibit floor. Far from requiring a wave of underpaid graduate students to assemble over a period of months, a computer grid, in eWEEK Labs observation, can now be deployed on enterprise time scales with affordable human resources.

Supercomputing on Sale

Supercomputing on Sale

Its made sense in previous decades to tackle highly parallel problems with equally specialized hardware, such as vector processing supercomputers, when the high cost of developing, building and programming those systems could readily be justified by far greater cost savings.

What now drives grid computing, however, is the cost-push of a commodity technology, rather than the demand-pull of problems worth solving at almost any price.

In particular, theres a compelling price/performance proposition in building a grid of high-density x86 blade servers running Linux-based operating systems. The bang for the buck, in terms of raw computing power, can pay for a lot of work toward abstracting the complexity of such platforms to create a suitable tool for solving enterprise problems. Whole new layers of software are entering the market as off-the-shelf solutions to this need. Oracle Corp.s Real Application Clusters technology, with its Cache Fusion architecture, is one example of this approach. Globus Toolkit, an open-source "service factory" framework providing state maintenance and discovery tools, is another.

"There are quite a number of technologies that have come together over the last three or four years," said Oracle Vice President of Technology Marketing Bob Shimp. In addition to the mass-market economies of x86 blades and a broadening spectrum of interconnection price/performance choice points, Shimp said, "Linux is a key technology that has matured in the last couple of years. Its not only a low-cost server operating system, but more interesting is that it was never designed as a general-purpose desktop product or server product, and so it doesnt have as many layers. Its very tight and very fast."

eWEEK Labs reviews of Linux-based offerings, ranging from PDAs to enterprise appliances, confirm Shimps positive assessment.

To a lesser extent, grids are also being demand-pulled into enterprise applications as the characteristics of business data converge with the strengths of what was formerly considered scientific computing.

"The purpose of data is insight," said Suns Khan. "A hundred rows of data, you can look at; 200 rows, you can graph; but 5 million rows need to be mined or visualized using what used to be exotic high-performance computing methods."

The volumes of data that are involved in e-business Web site operations, customer relationship management application suites or top-tier financial services offerings all lend themselves to Khans categorization and to affordable approaches that employ grid architectures.

One spectacular success of PC-derived components, harnessed in parallel by open-source-based software, is the widely used Google search engine, a massive cluster comprising more than 15,000 commodity-class PCs (as described in an IEEE Computer Society paper published last year.) The Google application has been designed to take advantage of these affordable building blocks, with different queries running on different processors and with a partitioned index that lets even a single query run on multiple processors.

Because the average Google query consumes, according to the same IEEE paper, tens of billions of CPU cycles while examining hundreds of megabytes of data, these design considerations make a crucial difference. They make it possible for Google to pursue the lowest possible ratio of price to performance and not, in the manner of past supercomputer efforts, peak processor performance regardless of cost.

That distinction is the value proposition of cluster configurations, which have gained much ground with the advent of Beowulf clusters that harness dedicated PC-class machines under free-software operating systems such as Linux or FreeBSD.

First evaluated by eWEEK Labs several years ago, Beowulf technology is now widely supported by tools, training and practical experience that feed the talent pool for more-ambitious grid deployments.

Thats not to understate, however, the vital distinction between a cluster (such as Google or a Beowulf installation) and a grid. A cluster uses affordable building blocks but does so in a known and typically homogeneous configuration of identical or at least quite similar units under a single controlling authority. A grid is a more dynamic and usually heterogeneous system, virtualizing resources that may be quite different in character and capabilities and that may even come and go without warning as their availability changes over time.

A November 2002 IBM RedPaper report, "Fundamentals of Grid Computing," states, "The goal is to create the illusion of a simple yet large, powerful and self-managing virtual computer out of a collection of connected heterogeneous systems sharing various combinations of resources." The paper continues, "The standardization of communications between heterogeneous systems created the Internet explosion. The emerging standardization for sharing resources, along with the availability of higher bandwidth, are driving a possibly equally large evolutionary step in grid computing."

What makes a grid?

Heterogeneity, flexibility and reliability set grids apart from supercomputers, server farms, clusters and peer-to-peer schemes

  • Distributed control Not as centralized as a cluster, but not as laissez faire as peer-to-peer

  • Dynamic configuration Machines can arrive and leave

  • Adaptive discovery Problems find needed resources rather than being built for a specific configuration

  • Quality-of-service maintenance Workloads are balanced, with attention to task priority, not just best-effort rules

  • Bandwidth diversity Processing-intensive applications can tolerate lower-speed data movement

    * Processor diversity Different architectures, like vector processors, can be matched to different problems

  • To take that step, a system should ideally function with a minimal amount of centralized administration, unlike a cluster or a server farm, where single-point control is the norm; a grid must use general-purpose protocols, unlike an application-specific system such as SETI@home; it must also afford the quality of service demanded for line-of-business applications—for example, by automatic load balancing across resources that come and go. Opportunistic systems such as most peer-to-peer implementations, in contrast, take whatever power they can get but lack higher-level protocols for allocating power to tasks.

    Its the wrong question, however, to ask whether a grid is the right model for any particular task. "The question," suggested Suns Khan, "is how you look at the packaging of the compute capability to fit the latency and bandwidth requirements of the application." Some problems, he said, demand a large shared memory that is best provided by a multiprocessor server with the fastest possible interconnections. Other problems, especially those with relatively large amounts of processing compared with data movement, Khan suggests, are more optimally handled on a cluster or grid using low-cost compute nodes with affordable Ethernet connections.

    A modern computer system, says Kahn, represents a balance between optimality for one task and flexibility for many tasks, rather than imposing a discrete choice of one approach versus another.

    Sun Chairman and CEO Scott McNealy reinforced that point when invited to comment for this article: "The world has embraced a vision of network computing based on industry standards and open interfaces. As a result, the adoption of grid computing is growing by leaps and bounds as IT professionals seek ways to get serious computing power from low-cost components."

    In the same way that Web services attract enterprise attention with their vision of applications on demand, grids offer an increasingly standardized way of delivering the corresponding computational resources on demand. In the same way that the Internet has become the platform of choice for connecting heterogeneous producers and consumers of data, the standards-based grid is the target of efforts aimed at operating on that data in an organized, fault-tolerant manner.

    With that focus of independent research and ROI-centered enterprise attention, eWEEK Labs projects that the barriers to grid use in any application will steadily erode, while the economics of the model will continue to improve.

    Technology Editor Peter Coffee can be contacted at

    Rocket Fuel