Supercomputing on Sale

By Peter Coffee  |  Posted 2004-02-09 Print this article Print

Supercomputing on Sale

Its made sense in previous decades to tackle highly parallel problems with equally specialized hardware, such as vector processing supercomputers, when the high cost of developing, building and programming those systems could readily be justified by far greater cost savings.

What now drives grid computing, however, is the cost-push of a commodity technology, rather than the demand-pull of problems worth solving at almost any price.

In particular, theres a compelling price/performance proposition in building a grid of high-density x86 blade servers running Linux-based operating systems. The bang for the buck, in terms of raw computing power, can pay for a lot of work toward abstracting the complexity of such platforms to create a suitable tool for solving enterprise problems. Whole new layers of software are entering the market as off-the-shelf solutions to this need. Oracle Corp.s Real Application Clusters technology, with its Cache Fusion architecture, is one example of this approach. Globus Toolkit, an open-source "service factory" framework providing state maintenance and discovery tools, is another.

"There are quite a number of technologies that have come together over the last three or four years," said Oracle Vice President of Technology Marketing Bob Shimp. In addition to the mass-market economies of x86 blades and a broadening spectrum of interconnection price/performance choice points, Shimp said, "Linux is a key technology that has matured in the last couple of years. Its not only a low-cost server operating system, but more interesting is that it was never designed as a general-purpose desktop product or server product, and so it doesnt have as many layers. Its very tight and very fast."

eWEEK Labs reviews of Linux-based offerings, ranging from PDAs to enterprise appliances, confirm Shimps positive assessment.

To a lesser extent, grids are also being demand-pulled into enterprise applications as the characteristics of business data converge with the strengths of what was formerly considered scientific computing.

"The purpose of data is insight," said Suns Khan. "A hundred rows of data, you can look at; 200 rows, you can graph; but 5 million rows need to be mined or visualized using what used to be exotic high-performance computing methods."

The volumes of data that are involved in e-business Web site operations, customer relationship management application suites or top-tier financial services offerings all lend themselves to Khans categorization and to affordable approaches that employ grid architectures.

One spectacular success of PC-derived components, harnessed in parallel by open-source-based software, is the widely used Google search engine, a massive cluster comprising more than 15,000 commodity-class PCs (as described in an IEEE Computer Society paper published last year.) The Google application has been designed to take advantage of these affordable building blocks, with different queries running on different processors and with a partitioned index that lets even a single query run on multiple processors.

Because the average Google query consumes, according to the same IEEE paper, tens of billions of CPU cycles while examining hundreds of megabytes of data, these design considerations make a crucial difference. They make it possible for Google to pursue the lowest possible ratio of price to performance and not, in the manner of past supercomputer efforts, peak processor performance regardless of cost.

That distinction is the value proposition of cluster configurations, which have gained much ground with the advent of Beowulf clusters that harness dedicated PC-class machines under free-software operating systems such as Linux or FreeBSD.

First evaluated by eWEEK Labs several years ago, Beowulf technology is now widely supported by tools, training and practical experience that feed the talent pool for more-ambitious grid deployments.

Thats not to understate, however, the vital distinction between a cluster (such as Google or a Beowulf installation) and a grid. A cluster uses affordable building blocks but does so in a known and typically homogeneous configuration of identical or at least quite similar units under a single controlling authority. A grid is a more dynamic and usually heterogeneous system, virtualizing resources that may be quite different in character and capabilities and that may even come and go without warning as their availability changes over time.

A November 2002 IBM RedPaper report, "Fundamentals of Grid Computing," states, "The goal is to create the illusion of a simple yet large, powerful and self-managing virtual computer out of a collection of connected heterogeneous systems sharing various combinations of resources." The paper continues, "The standardization of communications between heterogeneous systems created the Internet explosion. The emerging standardization for sharing resources, along with the availability of higher bandwidth, are driving a possibly equally large evolutionary step in grid computing."

What makes a grid?

Heterogeneity, flexibility and reliability set grids apart from supercomputers, server farms, clusters and peer-to-peer schemes

  • Distributed control Not as centralized as a cluster, but not as laissez faire as peer-to-peer
  • Dynamic configuration Machines can arrive and leave
  • Adaptive discovery Problems find needed resources rather than being built for a specific configuration
  • Quality-of-service maintenance Workloads are balanced, with attention to task priority, not just best-effort rules
  • Bandwidth diversity Processing-intensive applications can tolerate lower-speed data movement * Processor diversity Different architectures, like vector processors, can be matched to different problems

  • To take that step, a system should ideally function with a minimal amount of centralized administration, unlike a cluster or a server farm, where single-point control is the norm; a grid must use general-purpose protocols, unlike an application-specific system such as SETI@home; it must also afford the quality of service demanded for line-of-business applications—for example, by automatic load balancing across resources that come and go. Opportunistic systems such as most peer-to-peer implementations, in contrast, take whatever power they can get but lack higher-level protocols for allocating power to tasks.

    Its the wrong question, however, to ask whether a grid is the right model for any particular task. "The question," suggested Suns Khan, "is how you look at the packaging of the compute capability to fit the latency and bandwidth requirements of the application." Some problems, he said, demand a large shared memory that is best provided by a multiprocessor server with the fastest possible interconnections. Other problems, especially those with relatively large amounts of processing compared with data movement, Khan suggests, are more optimally handled on a cluster or grid using low-cost compute nodes with affordable Ethernet connections.

    A modern computer system, says Kahn, represents a balance between optimality for one task and flexibility for many tasks, rather than imposing a discrete choice of one approach versus another.

    Sun Chairman and CEO Scott McNealy reinforced that point when invited to comment for this article: "The world has embraced a vision of network computing based on industry standards and open interfaces. As a result, the adoption of grid computing is growing by leaps and bounds as IT professionals seek ways to get serious computing power from low-cost components."

    In the same way that Web services attract enterprise attention with their vision of applications on demand, grids offer an increasingly standardized way of delivering the corresponding computational resources on demand. In the same way that the Internet has become the platform of choice for connecting heterogeneous producers and consumers of data, the standards-based grid is the target of efforts aimed at operating on that data in an organized, fault-tolerant manner.

    With that focus of independent research and ROI-centered enterprise attention, eWEEK Labs projects that the barriers to grid use in any application will steadily erode, while the economics of the model will continue to improve.

    Technology Editor Peter Coffee can be contacted at

    Peter Coffee is Director of Platform Research at, where he serves as a liaison with the developer community to define the opportunity and clarify developersÔÇÖ technical requirements on the companyÔÇÖs evolving Apex Platform. Peter previously spent 18 years with eWEEK (formerly PC Week), the national news magazine of enterprise technology practice, where he reviewed software development tools and methods and wrote regular columns on emerging technologies and professional community issues.Before he began writing full-time in 1989, Peter spent eleven years in technical and management positions at Exxon and The Aerospace Corporation, including management of the latter companyÔÇÖs first desktop computing planning team and applied research in applications of artificial intelligence techniques. He holds an engineering degree from MIT and an MBA from Pepperdine University, he has held teaching appointments in computer science, business analytics and information systems management at Pepperdine, UCLA, and Chapman College.

    Submit a Comment

    Loading Comments...
    Manage your Newsletters: Login   Register My Newsletters

    Rocket Fuel