Application developers and PC buyers should take note of Intel Corp.s plans, disclosed earlier this month, for hyperthreading support in forthcoming mobile versions of the Pentium 4. This will bring multiprocessing capabilities—if only in the virtual sense of single processors that mimic dual-core designs—to almost every tier of the IT stack. Even personal productivity and communication applications will be potential beneficiaries of significant performance gains.
This development will be a mixed blessing, however, posing both opportunity and challenge for developers and complicating the task of making meaningful performance comparisons for buyers of any device more substantial than a high-end PDA.
Processor architects know that the move to multiprocessing is a matter of “when,” not “if.” At the top tier of computing, as of last month, almost a third of the worlds fastest 500 systems were cluster configurations, up from less than one-fifth six months earlier. The economics are becoming more compelling all the way down to desktops and even portable machines.
In the general case of processing complex groups of instructions, where any output may depend on any input, the size of the processor chip—and the speed-of-light transit time across it—determine the worst-case scenario for getting information from one stage of computation to another. The area of a chip (a rough proxy for its processing power) is proportional to the square of its linear dimension. This means that a chip thats scaled up to have four times the area for processing units, cache and other hardware will typically be burdened by twice the time lag in moving information from one point to another.
A processor designer might have a choice between a “2-by-2” arrangement of processor cores, each able to perform a unit of work in a certain time; or the use of an entire chips area for a single, more complex processor with four times the fundamental resources but twice the internal communication delays. Assuming that tasks can be efficiently parallelized, the multiprocessor design might be up to twice as fast.
Since even casual users are beginning to have multitasking workloads, such as using local productivity tools while also monitoring network services, the relatively simple partitioning of separate and concurrent tasks across multiple processors brings the first tier of multiprocessing benefits within easy reach. All thats needed is the operating systems support for such configurations.
Beyond that low-hanging fruit, application developers outside the data center will face the same, more complex challenge thats long been on the whiteboards of their server-centered colleagues: the puzzle of how to make efficient use of concurrent processing power within a single task without giving up most of its benefits to the clutter and overhead of process coordination. Established algorithms, such as calculating a series of terms and then adding up the results, exhibit two classes of problem, the “side effect” and the “serial bottleneck.”
The side-effect problem results from successive steps in a process altering the values of variables along the way, making it impossible to perform calculations in parallel because their inputs depend on each others results. Exotic programming languages have devised ingenious solutions, such as the data structure called a “future” in Butterfly LISP: a value placeholder that can be passed to the next stage of a calculation for processing even while its value is still being determined.
The other problem is that of serial bottlenecks, where results determined in parallel must at some point be brought together. If 20 percent of a calculation is dominated by serial steps, then even an infinite degree of parallelism in the other steps can never produce more than a factor-of-five improvement in overall speed. Algorithms must be rethought to avoid such roadblocks.
Developers surveying their multiprocessing opportunities are confronting the costly, frustrating paradox that computer software—mere strings of bits—has become more difficult to replace than the metal-and-plastic hardware that it controls. An enterprise may turn over most of its population of desktop computers, and even its servers, over a period of only three to five years, but the basic structure of the software that drives those machines can easily be three times that age.
Even handheld devices are moving into the realm of multiprocessing because of rapid advances in wireless connectivity, becoming the user-facing nodes of distributed systems and collaborating with other devices as new protocols gain momentum.
The pursuit of performance must therefore dislodge the inertia of long-standing programming models, as well as surmounting the barriers of hardware complexity, cost and fundamental physics. The most obstinate burden is the one that falls on software developers, and on the designers of the tools that developers use, to write comprehensible software for multiprocessor machines.
Technology Editor Peter Coffee can be reached at [email protected].