Small Coprocessor Promises Big Server Boost

With an output of up to 25.6 gigaflops per processor, ClearSpeed's CS301 can help reduce clusters of x86-based servers to a chassis or two.

Startup ClearSpeed Technology on Tuesday is expected to announce a coprocessor for x86-based servers that the company said could help shrink massive clustered systems down into a server chassis or two.

ClearSpeeds 32-bit CS301 coprocessor runs at only 200MHz but outputs up to 25.6 gigaflops per processor. The companys chief designers envision the chip perched on a PCI daughtercard, assisting the main CPU with computation-intensive parallel tasks, such as those used in the biotechnology and scientific communities.

"Were seeing an unbelievable amount of compute-intensive operations today, and were seeing a dramatic increase in compute requirements, between whats capable of being processed today and whats required to be processed today," sad Mike Calise, president of ClearSpeed, which has offices in Los Gatos, Calif., and Bristol, U.K.

The chip will be disclosed at the Microprocessor Forum in San Jose, Calif.

The coprocessor is relatively small—41 million transistors take up 72 square mm using an IBM 0.13 silicon-on-insulator process—but the processing power comes from the combination of an array of 64 processing elements organized across the surface of the chip. Instead of being fed by an individual cache, each element contains its own register file and program-execution memory.

"We could be doing 64 different proteins or drug molecules processed simultaneously at that level," said Simon Macintosh-Smith, the director of architecture at the company. The ClearSpeed CS301 calculates which operations can be parallelized, rather than forcing a compiler to do the work, as Intels Itanium chips do.

The chips 25.6-gigaflop output is more than double the 12 gigaflops produced by a 3.0GHz Pentium 4. By comparison, the National Center for Supercomputer Applications was using a 1,512-processor, 660-gigaflop SGI Origin2000 array up until November 2002. Running the chip at a low clock speed also means that the CS301 consumes very little power: only 2.5 watts, meaning that the chip produces 8.5 gigaflops per watt, compared with just 0.1 gigaflop per watt for the Pentium 4.

By 2004, the company said, a CS301-assisted cabinet of Opteron or Itanium processors could generate 48 teraflops, more than the 36 teraflops currently produced by Japans Earth Simulator, an estimated $350 million investment that required its own building.

The CS301 will sample in the fourth quarter, executives said.

Discuss this in the eWEEK forum.