Nvidia is continuing its efforts to push GPUs deeper into mainstream computing with the unveiling of its upcoming line of graphics processors that will be built on its new “Fermi” architecture.
Nvidia officials announced the Tesla 20 series Nov. 16 at the Supercomputing show in Portland, Ore.
Designed for use in parallel computing projects, the Fermi-based Tesla 20 series of GPUs will offer the performance of traditional CPU-based clusters at a 10th of the cost and a 20th of the power, according to company officials.
Nvidia has fueled the push to make GPUs more general-purpose, particularly in computing-intensive fields such as HPC (high-performance computing). The shift began in 2006 with Nvidia’s introduction of the CUDA processor architecture, said Sumit Gupta, senior product manager for Nvidia’s Tesla business.
“Until two years ago, the talk from OEMs was about CPUs,” Gupta said in an interview. “This is changing.”
CPU vendors Advanced Micro Devices and Intel are now looking to add greater graphics capabilities to their processors, and a host of OEMs, including Cray, Dell, Hewlett-Packard, NEC and SGI, are demonstrating Tesla GPUs at the Supercomputing show, Gupta said.
Appro is showcasing its HyperPower GPU performance clusters, which run both Nvidia Tesla GPUs and “Xeon” processors from Intel. Also at the show, Super Micro Computer is demonstrating its new 2U (3.5-inch) Twin servers equipped with two Tesla C1060 GPUs.
“Within the past two years, the whole face of the show has changed,” Gupta said.
The Tesla 20 series brings together a number of features officials say have never been offered on a single device before, including ECC (error-correcting codes), multilevel cache hierarchy with Level 1 and Level 2 caches, support for C++ programming language and up to 1TB of memory, concurrent kernel execution and fast context switching.
Other features include 64-bit virtual address space, system calls and recursive functions.
The new Tesla 20 series, which will be available in May 2010, will include the C2050 and C2070 GPU computing processors, which will offer single GPU PCI-Express Gen-2 cards for workstations, up to 3GB and 6GB of on-board memory, and performance of between 520 and 630 gigaflops, or billions of floating-point instructions per second.
The S2050 and S2070 GPU computing systems will offer four GPUs in a 1U system for cluster and data center deployments, up to 12GB and 24GB of memory, and performance of 2.1 to 2.5 teraflops, or trillions of floating-point instructions per second.