Dual-core processors are proliferating through PC product lines, including newly Windows-capable Macintosh systems. April has also seen the open-source release of Sun Microsystems SPARC T1 processor design, with up to four threads running on each of up to eight cores.
Challenges that were once in the domain of supercomputing now present themselves to enterprise architects: Multithreaded cores, multicore processors and multiprocessor grids challenge software writers to rethink tasks for parallel processing.
Processor producers know that if they dont help software developers achieve nearly N-fold speedup from an N-core chip—”linear speedup,” as its called for short—then there wont be an attractive return on corporate investment in systems that use these complex processor designs.
Linear speedup is not achievable in most tasks: If an N-core processor merely fetches and concurrently performs successive blocks of instructions at a time, bad things happen, because its common for the input to one instruction to be the output from the one before.
Any approach that performs more than one instruction during a single clock cycle must detect such sequential dependencies—and hold off on executing instructions whose input is not yet available.
In practice, its more common to see power-law speedups with an exponent around 0.7, where two cores run a real task about 60 percent faster than one (2 to the power 0.7 is about 1.6) and four cores run only about 2.6 times as fast as one (4 to the 0.7). Some tasks, such as image processing, have exponents close to 1, while other tasks with strong sequential dependencies show exponents more like 0.3. At the latter degree of parallelization, 32 processors would run less than three times as fast as one.
In a high-end chip makers nightmare of such diminishing returns, buyers would have every reason to favor simple and mature designs built at razor-thin profit margins by any number of aggressive competitors.
Complex processors do have a payback proposition, paving a path toward more compact, less power-hungry and therefore less heat-generating server installations—but only if the speedup is there to pay for costly development and state-of-the-art fabrication.
It might seem as if inefficient code would lead to buyers needing more processors to perform a given task, and that this would be just fine with processor vendors, but this cynical reasoning overlooks the competitive environment just described—one in which vendors need to take the lead in wringing maximum ROI from their own technology.
Its also clear that cost-effective computing doesnt shrink technology demand. Rather, by pushing previously marginal applications over the threshold of being well worth doing, more cost-effectiveness makes IT vendors more money, not less.
Its therefore no surprise that a company like Intel produces not just chips but also sophisticated tools for optimizing the chips performance. As far back as the debut of Intels first Pentium processors in March 1993, when the chips two concurrent pipelines posed real challenges to developers, Ive found Intels VTune Performance Analyzer a real eye-opener into whats actually happening inside a CPU.
The Windows version of VTune 8.0, released in February, includes full Vista and .Net support: “It can take you down to source code or assembly code, or go up to the thread level or the lock level and diagnose correctness errors with locks,” said Intel Development Products Division Director James Reinders, when we talked about the product in March.
Albert Einstein famously said time is what keeps everything from happening all at once; his collaborator John Wheeler is less well-known for adding, “and space is what keeps everything from happening to me.”
Software developers may well wish that they could seek refuge in Wheelers space, because Einsteins time is no longer on their side: Making things happen all at once is now their job, with tools like VTune their best hope of getting it done.
Peter Coffee can be reached at firstname.lastname@example.org.