Technically speaking, grid computing enables programs to be spread out over multiple computers via a network so that massive jobs can be done as efficiently as possible.
Does that sound a lot like a traditional cluster to you? If it does, youre right—grid computing is essentially clustering writ large.
There are, however, several important differences. With a grid, instead of having multiple processors bound together by a system bus or by a high-speed fabric such as iSCSI or Fibre Channel, a grids computers can be thousands of miles apart and tied together with conventional Internet networking technologies such as OC-3 (Optical Carrier-3, a 155.52 Mbps network technology) or even a lowly T1 (1.54Mbps)
In addition, in a cluster, the systems tend to all be the same. For example, the clusters I cut my teeth on in the early 80s, DEC VAX-11/785s minicomputers running VMS 4.1, had identical hardware.
Today, its much the same. For example, typical examples of the popular Linux-based Beowulf clusters use commodity hardware, such as Pentium chips with standard network technologies like Fast Ethernet, for binding together inexpensive clusters.
Usually people use clusters for one of two purposes: HA (high availability) for greater reliability or HPC (high-performance computing) for faster processing. Indeed, according to the latest Top500 directory of top supercomputers, most of the fastest supercomputers—such as the current leader, IBMs BlueGene/L—are actually clusters.
While a grid may have the same goals of HA and HPC, the component systems do not have to share the same architecture or operating systems. For example, with United Devices Inc.s Grid MP Enterprise, users can run grid applications across heterogeneous systems running 32-bit Windows and Linux on x86, AIX on POWER, and Solaris on SPARC.
Again, in some ways, this may sound like old hat to you. Distributed computing projects, such as SETI@home, have long enabled users running everything from OS/2 to HP/UX to Windows 95 to tackle small parts of huge jobs.
With SETI and similar projects, the machines are dedicated to a single task. In a grid, resources can be shared dynamically to address multiple problems.
"The goal is to create the illusion of a simple yet large and powerful, self-managing, virtual computer out of a large collection of connected heterogeneous systems sharing various combinations of resources," Viktors Berstis, an IBM software engineer, said in the IBM Redbook Fundamentals of Grid Computing (PDF file).
To make this happen, a grid uses a program that works in concert with the various operating systems to coordinate the efforts of various machines. Typically, the program enforces a set of standards and protocols to establish how a system shares resources. IBM, for example, uses those of the OGSA (Open Grid Services Architecture).
The heart of a grid is its job scheduler. With a scheduler, the system allocates resources to the various jobs. A system with good scalability will divide up jobs between a grids systems so that its computers will be used efficiently and wont sit around idle.
There are several ways to do this. One is when jobs are assigned by "scavenging." With this approach, idle machines signal the scheduler that theyre available for more work. In another approach, "reservations," systems are preassigned to a schedule for efficient workflow. In practice, some grids use both combined with dynamic resource allocation.
With the last approach, more systems are brought in to deal with a problem as the workload increases. For example, if during the holiday rush, a credit card system starts to be overwhelmed; more systems can be called in to make certain that the charges keep flowing.