IBM: Power5 Chip to Tap Threading

The company said simultaneous multithreading will provide the processor a 40 percent gain over the Power4 with only a 24 percent rise in per-core die size.

PALO ALTO, Calif.—A clever implementation of simultaneous multithreading will allow IBM Corp.s Power5 processor to increase performance by 40 percent compared with the Power4, while only increasing the per-core die size by about 24 percent.

Simultaneous multithreading, which Intel Corp. has marketed as "hyperthreading", allows the processor to operate on whichever instruction stream, or "thread," demands immediate attention.

Since Armonk, N.Y.-based IBMs Power4 implemented two processor cores per chip, the Power5 will present four virtual cores to the operating system—two physical cores and two virtual ones—said Ron Kalla, a system designer for IBM, in a Wednesday presentation at the Hot Chips conference here.

The Power5 chip is on track to ship in 2004, Kalla said. Initially, IBM will fabricate the chip using 130-nm process technologies, copper interconnects and silicon-on-insulator technology. In subsequent generations, IBM will shrink the die by manufacturing the chip on a 90-nm process, Kalla said.

Kalla declined to offer hard details on the chips clock speed, cache sizes, cost, power, or other product-specific characteristics. Some of those details will be revealed at the Microprocessor Forum in October, according to Peter Glaskowsky, editor of The Microprocessor Report and an analyst at In-Stat/MDR, which is hosting the chip confab.

Kalla characterized the Power5 as an extension to the Power4 architecture, with additions tacked on to support the additional thread. For example, a thread bit was added to most addressing buses to handle the additional thread. A second program counter was added after functions are called from the instruction cache, with a tag added to each instruction to indicate what tread the instruction belongs to as it is decoded, Kalla said.

While the Power4 has 80 physical registers, 120 registers will be available to programmers using the Power5, Kalla said. All registers, caches and instruction units can be shared between both threads.

The operating system can also assign up to eight thread priority levels to each thread, Kalla said. If both threads are idle—Priority 0—the operating systems can turn the processor off to consume power, he said. Most of the microarchitectural units within the chip are power-managed, although Kalla declined to say what the effect the power management would have.

But actually disabling the multithreading option actually has an important side effect—it can improve performance, Kalla said. Turning off a thread gives the single thread access to all 120 registers, affording the Power5 a significant instruction-per-clock (IPC) advantage compared with the Power4. IBM architected the Power5 to allow the operating system to turn the Power5s SMT capabilities on and off to maximize performance, Kalla said.

According to Kalla, IBM has booted the chip in its labs on the AIX, Linux and OS/400 operating systems.