Nvidia is looking to expand its presence in the high-performance computing market with a second-generation graphics processor that offers 240 graphics processing cores and 1 teraflop of performance.
The graphics company is officially releasing its Tesla 10 series GPU (graphics processing unit) June 16 along with a new 1U (1.75-inch) rack-mount server designed for the increasingly competitive HPC market.
In 2007, Nvidia released the first of its Tesla processors for HPC as the company looked to expand its business beyond its traditional market of discrete graphics and chip sets for PCs. In order to create a community that would develop applications for a GPU-based HPC system, the company also developed a programming language dubbed CUDA (Compute Unified Device Architecture), which allows the GPU to be programmed like an x86 CPU. With the release of the new Tesla GPU, Nvidia is also releasing an early version of CUDA 2.0.
While most HPC is done with traditional microprocessors, Nvidia said it believes the GPU, which can break data apart and solve problems by working in parallel, represents a shift in how to offer more performance at the chip level for solving large, complex problems in fields ranging from financial services to oil and gas exploration.
Amitabh Varshney, a professor of computer science at the University of Maryland, is currently working to create applications that take advantage of HPC systems that use a combination of CPUs and GPUs. He wrote in an email that GPUs have open up new avenues for students and others to think about how write applications that take advantage of the possibilities of parallel computing.
“Over the next few years because of the wide and inexpensive availability of GPUs we might very well see a large number of young parallel programming hobbyists and visual computing enthusiasts who take to GPUs just because it is fun while being challenging,” Varshney wrote. “HPC is likely to benefit from a large pool of talented and interested enthusiasts. Another salutary impact of increased affordability of HPC through GPUs is likely to be the broadening of HPC’s target areas to a far richer suite of driving applications.”
Other IT companies are also using graphics technology to offer better performance for HPC systems and supercomputers. IBM used its own Cell processor as an accelerator in its newly installed Roadrunner supercomputer, while Intel and Advanced Micro Devices are each looking to develop chips that combine CPUs and GPUs on the same silicon die.
AMDs FireStream Offers 1 Teraflop of Performance
On June 16, AMD also is announcing a new graphics processor, the FireStream 9250, which is also geared for HPC and offers 1 teraflop of performance.
“What we hear from a lot of our customers is that they have maxed out what they can do with a homogeneous cluster and we are now ready for another type of architecture to take over and take high-performance computing forward,” said Andy Keane, general manager of Nvidia’s GPU Computing business unit.
“A lot of the questions we get are not how to get things faster by a factor of four, but how do I get to 10X or 100X because these are the scale of problems, whether it is weather problems or design problems,” Keane added. “The way we talk about this is heterogeneous computing.”
Nvidia made several improvements on previous chips with its Tesla 10 processor, which contains 1.4 billion transistors and is manufactured on a 55-nanometer process. This Tesla processor is based on new microarchitecture that allowed the company to place 240 cores on a single die, nearly double the amount of the older Tesla 8 chip.
The Tesla chip is designed in a series of arrays. First, engineers created the basic graphics processing core and then duplicated that core eight times. These eight cores are then arranged around shared memory and an interface unit. There are 30 of these arrays, allowing Tesla to have 240 cores.
The clock speed of the Tesla 10 series GPU ranges from 1.33GHz to 1.5GHz.
The company also increased the performance of new chips from 500 gigaflops-500 billion calculations per second-to 1 teraflop or 1 trillion calculations per second. Nvidia’s engineers also increased the onboard memory from 1.5GB to 4GB.
Nvidia has also designed the new Tesla to handle double-precision computing, which doubles its ability to process data in terms of speed and quantity. With its new QS22 blade based on its Cell processor, IBM also offers a machine capable of handling double-precision computing.
While Nvidia is using graphics in HPC, the company said it believes that the industry is moving toward a heterogeneous model that mixes the capabilities of both CPUs and GPUs to handle complex tasks. With that in mind, the company is also offering its S1070 system, which uses four Tesla GPUs, has 16GB of memory and gives 4 teraflops of performance.
The system also has two second-generation PCI Express cards that can connect the system to standard servers based on x86 processors. Hewlett-Packard, Sun Microsystems and Dell all offer systems compatible with the Tesla-based product.
In order to ensure that developers and third-party software vendors begin developing applications for use with Tesla, Nvidia has also updated CUDA to work with a range of new operating systems, including 64-bit versions of Linux and Microsoft Windows XP, as well as Vista and the latest versions of the Mac OS.
The newer version of CUDA also allows for developers to create applications that take advantage of the double-precision technology within the Tesla 10 series.
By Nvidia’s account, there have been 70,000 downloads of the first version of the CUDA compiler. The goal now, Keane said, is to get more institutions and universities to teach the CUDA language to a new generation of programmers. In much the same way, Microsoft and Intel are working to ensure that more developers are trained to write code that takes advantage of parallel computing and multicore processor technology.
John Spooner, an analyst with Technology Business Research, said Nvidia and AMD are trying to address markets that are increasingly looking for more and more computing power. At the same time, the HPC field is changing with graphics playing a much larger role in how to achieve the computing performance that enterprises such as oil and gas companies are looking for.
“The good part about what Nvidia is doing is that they, as well as AMD, will be ahead of the curve when it comes to designing graphics processors for high-performance computing,” Spooner said. “I think the more difficult part for Nvidia is that CUDA is a proprietary language and that tends to be a hard sell. They do need a programming model for their products, but there are some clear benefits to having an open-source model.”