Experienced aviators warn novice pilots, “the problem with a multiengine airplane is that sometimes you need them all.” During takeoff, for example, failure of even a single engine is a high-risk situation—but with more engines, there is a greater chance of at least one failure.
Distributed computing systems, such as grid computing, involve a similar paradox: The more resources the system has, the greater the number of points where the system can fail or degrade—and the harder the task of ensuring adequate performance in all situations, without unacceptable overhead.
A computing grid faces four new tasks, in addition to whatever problems it was built to solve. The grid must discover and allocate available resources as their availability comes and goes; it must protect long-distance interactions against intrusion, interception and disruption; it must monitor network status; and it must initiate and manage communications among the processing nodes to make each ones needs known to the others. There is no single optimal approach to any of these tasks but rather a family of possible solutions that match up with different types of problems.
Delays in communication between widely separated nodes fall into two groups. Fundamental is the speed-of-light limit: A node at one location cannot possibly become aware of a change in state at another location in less than the straight-line, speed-of-light propagation time of almost exactly 1 nanosecond per foot of separation.
That sounds good until its compared, for example, with modern local memory access times of, at most, a few tens of nanoseconds. Tightly coupled applications, such as simulation or process control, are therefore disadvantaged in distributed environments “until science discovers a method of communication that is not limited by the speed of light,” as Aerospace Corp. scientists Craig Lee and James Stepanek wrote in their paper published in April 2001 (which can be accessed via www.eweek.com/links).
There are problem decomposition techniques that arent as badly handicapped by the speed of light: for example, Monte Carlo simulation, or the kind of data parceling strategies made famous by the SETI@Home project, which distributes sets of radio telescope data for intelligent-life detection by screen saver software. When problems lend themselves to this approach, they often dont need frequent synchronization and therefore arent severely hampered by distance.
What does affect the latter class of problem, though, is the limited bandwidth of networks and network interfaces. Plotting recent progress, Lee and Stepanek in the paper cited earlier find network access bandwidth, as determined by available interface cards, doubling every 2.5 years, ominously lagging the 1.5-year doubling time of processor performance, assuming continued Moores Law improvement—which many project as likely through 2010.
With processor speed outpacing the ability of interface cards to send and receive to the grid, it follows that some processing power will be best employed in boosting information content per bit: for example, by continuing the refinement of data compression algorithms using techniques such as the wavelet transforms in the JPEG 2000 standard.
Data compression developments such as these are offset, however—perhaps to devastating effect—by the growth of data overhead entailed in the use of XML syntax to make data more self-disclosing than it is in application-specific binary data structures. Theres a difficult trade-off to be made between ad hoc availability of data for unanticipated uses and efficient, cost-effective packaging of data.
Sad to say, a great deal of processing power may also be consumed by the calculations needed to implement data integrity and security measures, such as encryption for authentication of messages sent and received. Grid computing, in an open environment such as an IP network, invites both attempts to read the mail between the nodes and to analyze the patterns of traffic for what they might reveal about concentrations of valuable information.
If network and computer are the same, it follows that the network—an inherently exposed asset—is increasingly the locus of IT value. Enterprise IT architects and service providers will have to learn to protect it without crippling its hoped-for performance gains.