CPU Power Push

In the debut of desktop computing, general-purpose microprocessors opened the door to affordable single-user systems. The limited power of those early CPUs meant that almost every advance in chip speed—or expansion of address space—yielded substantial improvement in visible and relevant performance. Every year, announcements at the Microprocessor Forum told the industry what new chips would propel the next wave of PC and server sales.

At this years Microprocessor Forum, in San Jose, Calif., conference founder Michael Slater ruefully observed that two decades of Moores Law progress have put the microchip industry, perversely, in the position of having more power to sell than most PC buyers can use. “Raw compute horsepower,” said Slater in remarks at the end of the forums first day, “is the easy part. Yes, faster processors are enabling components—but the computer science of next-generation usability is years behind the hardware.”

The result, Slater warned, is that “there will not be volume demand at the leading edge”—at least, not the kind of demand that historically greeted landmark microchips, such as Intel Corp.s 386, 486 and Pentium, with hordes of eager buyers. The resulting challenge for microprocessor designers, he said, will increasingly be to focus on “solutions, not horsepower.”

Slaters comments suggest a shift of power—from technology providers to technology buyers—in dictating the direction and pace of future hardware evolution. The alternative paths available to enterprise IT buyers were displayed in forum presentations from key players Intel; Advanced Micro Devices Inc.; Motorola Inc.; and Centaur Technology, a subsidiary of Via Technologies Inc.—each company staking out a different corner of the microchip arena.

Intel, with the resources of what Slater called “a capital-rich manufacturing machine,” came to San Jose with an array of elaborate offerings.

For mobile applications, Intel stated a new design goal of maximizing performance within a defined power envelope. This is in contrast to the companys former approach of producing mobile processors as, essentially, trickle-down byproducts of the manufacturing-process improvements made to each successive generation of its peak-performance designs.

When power consumption moves to the head of the list of design criteria, interesting opportunities arise.

For example, a circuit element performing a logical AND operation will produce a “logical low” output if either of its two inputs is low, and a conventional chip design would not make any special effort to hold both inputs low in the most common case since there is no difference in the logical effect.

There is, however, a factor-of-5 difference in the leakage current, as described by Mooly Eden, general manager of Intels Israel Design Center. When the Level 2 cache for Intels forthcoming Banias family was designed with this in mind, Eden said, more than a full watt of power consumption was saved. (For more on Banias, see story.)

This illustrates the opportunities available to Intel, and therefore to portable PC buyers, to pursue the goal of “all-day,” unplugged computing without giving up the performance of desktop machines.

Intels refocused design will be running the x86 instructions directly, in the same manner as a Pentium III-M or other mobile processors popular in present high-end laptops. This is in contrast to the indirect approach taken by chips such as Transmeta Corp.s Crusoe, which achieves low power consumption by radically simplifying core logic circuits to execute VLIW (very long instruction word) native operations.

Transmeta achieves the x86 compatibility that the market demands by loading, on system startup, an application called Code Morphing that dynamically translates mainstream PC software sequences into Crusoe chip instructions. The rest of the software foundation, including BIOS and other code such as the operating system, then runs on top of that “morphing” layer.

More than just a simple translator, the Transmeta software attempts to make sophisticated trade-offs among speed, consumption of on-chip resources and power demands. It attempts to identify rarely used instructions that can be translated once and then discarded until their next occurrence; it also attempts to cache frequently used sequences in pre- translated form.

For highly repetitive operations, such as image processing or other media-oriented tasks that involve high processor workloads, Transmeta can offer designers an excellent match between form and function. The same approach is also effective in many server roles, which has ironically placed Transmetas portability solution at the heart of several high-density blade server offerings (where cooling is high on the list of challenges).

But “one size does not fit all,” asserted Robert Yung, Intels chief technology officer for enterprise processors. Even while the company rethinks its approach to x86 design for mobile markets, it is still moving forward with the completely different instruction set of its IA-64 family in the Itanium 2 and follow-on designs.

Everything about the IA-64 chips is oversized: 328 on-chip storage registers, 50-bit (petabyte-capable) physical address space and probably 10MB of L2 cache before the end of the decade, Yung projected.

Cache already represents 78 percent of the transistor count for the current “McKinley” Itanium core and will grow to 88 percent in the follow-on “Madison” design. This is a signal to enterprise buyers that getting data to and from the processing point is increasingly the dominant problem, rather than expediting computation itself. This is a general imperative that will resonate through every element of enterprise IT design and should drive IT architects toward solutions (such as Web services) that place computation as close as possible to the points where data originates or where users questions arise.

In the opposite corner is Via/ Centaur, represented by forum regular Glenn Henry, president of Centaur Technology. “People really want x86,” said Henry in a private session with eWeek Labs. “Theres so much infrastructure of tools, skills and peripherals; if theres an x86 in a space, it will win.”

Henrys goal, though, is to channel that momentum toward new applications, “things that hundreds of millions of people want to do,” he said.

In Henrys view, that means building appliances such as set-top boxes that dont need cooling fans, or PCs that sell for at most a few hundred dollars—such as the Lindows-based, Centaur-powered Microtel Computer Systems units now selling for as little as $200 at Wal-Mart stores.

“Its not that Pentium 4 is the wrong processor,” Henry said, “its just that its volume is inherently limited. Intels parts dont enable new applications.”

To prove the point that a compact, fanless, but fully capable PC is deliverable today, Henry showed eWeek Labs a 17-centimeter-square module—”a real board, you can buy this for $130 at Frys,” he said—that was built around a Centaur processor and that only draws about 8 watts, “even with every connectivity port known to man. We wanted people to be able to use this for anything,” Henry explained.

“Low cost is more important than megahertz,” Henry told his forum audience. “Low power is more important than megahertz. Were low cost, low power and fast enough.”

64 Bits on Desktop

64 Bits on Desktop

Evidently agreeing with Henry on the primacy of the x86 instruction set is AMD, which appeared at this years forum with the first hard performance numbers for its next-generation x86-compatible chips, which will offer a 64-bit extension of the 32-bit instruction set that debuted in 386-class microprocessors.

Directly rebutting Intels Yung was AMD Vice President and CTO Fred Weber, another forum regular, who said, “We really do believe that one size does fit all—not the same processor but the same instruction set.”

Despite years of angels-on-pinheads debate about RISC versus CISC architectures, Weber asserted, the choice of instruction set should be driven by “compatibility, not performance.”

Holding up a deck of memory cards, Weber asked the forum audience, “When this is 4GB of RAM, why wouldnt you want a 64-bit system on your desktop?”

Weber added that theres a bonus even for running 32-bit applications in a 64-bit address space.

On a 32-bit platform, he noted, the operating system consumes a substantial fraction of the single 4GB address space that could otherwise be used entirely by the application. “Memory-bound 32-bit applications can have their own full 4GB in our compatibility mode, so they get a one-time memory boost” without being redesigned for the 64-bit platform, he said.

Just as desktop applications quickly took advantage of memory beyond the 1MB limit of the Intel 286 and earlier chips as the 386 became mainstream, its plausible that enterprise applications involving large databases and media-oriented applications involving rich data streams will quickly seize upon the power that Weber describes—as soon as it becomes economical to buy and easy to access with improved platforms and tools.

A major portion of Webers forum presentation was devoted to a discussion of the performance—and especially the scalability to multiprocessor designs—of the nonproprietary HyperTransport protocol to be used by forthcoming AMD chips. In a quadprocessor design, Weber calculated, each processor core could enjoy local memory access at 3.9GB per second, or the four cores could share common memory at 2.8GB per second.

With the high-bandwidth connectivity of AMDs 64-bit Hammer, Weber said, “four-way becomes the norm.”

Attendees at past forum conferences must surely have noted the resemblance between Webers talk and presentations by IBM concerning that companys Power4 processor, which also features a bandwidth-rich design thats well-suited to video and other data-intensive applications.

Nor is Intel ignoring this concern: The Itanium 2 is already on its way, according to Intels Yung, into configurations with as many as 512 processors.

Clearly, the interconnection schemes among processors are vying with their internal sophistication for top-of-mind status among prospective buyers.

The fourth corner of the arena is ably defended by Motorolas ColdFire core, a synthesizable chunk of intellectual property thats readily tailored and embedded into both standard and custom products.

New at this years forum was discussion of the forthcoming Version 5, a superscalar design that should typically achieve at least a third more processing per clock cycle than the predecessor Version 4 design.

ColdFires heritage is the 68000 instruction set, probably second only to x86 for ubiquity of programming tools and skills—and a more modern instruction set design from its initial conception.

ColdFire instructions can be 16, 32 or 48 bits in length, but that variety isnt the burden that it was once claimed to be by advocates of uniform-length RISC instructions.

As described by ColdFire Chief Architect Joe Circello, a hardware- resident table produces a vector of operations for each instruction during an early decode stage. This enables, for example, quick determination of whether instructions are interdependent and therefore unsuited to concurrent execution.

Because the ColdFire design avoids any need for hand tuning to different semiconductor process technologies, it offers designers a short time to market with a highly tailored solution, Circello said. “You want a memory management unit? Flip a switch [during design generation, and] it gets synthesized in. Floating-point hardware? Likewise.”

Motorolas product groups already take advantage of this flexibility in developing standard parts, as well as offering it for use in custom solutions as enterprises move more advanced technology into the field. With handheld or networked devices distributing intelligence to many points in the manufacturing process and supply chain, such tailored solutions will move higher up on enterprise IT agendas.

Competitive Advantage

Competitive Advantage

Although the forum included many other presentations, the enterprise IT buyer can get a good sense of the competitive landscape by comparing the four discussed here.

Desktop systems enjoy a spectrum of choices, ranging from the almost-outrageous power of the Itanium 2 (combined with the technical risk of a completely new software base) to the “fast enough” pragmatism of Via/ Centaurs compact and cool-running designs. AMD, with its x86 compatibility and 64-bit extensibility, offers an attractive middle ground.

Embedded custom solutions can take advantage of the x86 or 68000 skills base in power-thrifty hardware assembled from standard parts or custom-built to precise requirements.

Hardware choices should be quickly made, however, so that software can receive a much-needed lions share of system designers attention.

“Our typical customer,” said Circello, “has 10 firmware designers for every hardware designer.”

That ratio should signal enterprise IT that competitive advantage will come more from developing unique intellectual property, enabled by any of several viable hardware choices, rather than from agonizing over which bit of silicon should anchor that innovation.

Technology Editor Peter Coffee can be reached at peter_coffee@ziffdavis.com.

Shopping for Silicon

Shopping for Silicon

Desktop and portable processors

High clock rate is a cost, not a benefit; it drives up costs of everything else in the machine.
Application benchmarks tell the story that matters.
Compact, quiet systems give back desk space, reduce workplace noise levels and cut air conditioning costs. Small-footprint desktop and high-function laptop systems should dominate the next wave of desktop buys, likely by the end of next year.

Server processors

Bandwidth, not processor power, is the engineering challenge. IBM and AMD are addressing this directly; Intel is trying to solve it with lots of cache.
Ask about scalability of multiprocessor systems: Is there a near-term road map to four-way and beyond?
Chip builders are looking for things to do with more on-chip transistors. Application accelerators for compute-intensive tasks, such as encryption, will characterize next-generation server designs.

Related Stories:

Intel Banks on Banias Mobile Chips
Chips Power Killer Apps for Handhelds
Data Bandwidth: Memory Should Take Pride of Place

CPU Power Push

Peter Coffee

Company

Categories