AMDs FireStream Offers 1 Teraflop of Performance
On June 16, AMD also is announcing a new graphics processor, the FireStream 9250, which is also geared for HPC and offers 1 teraflop of performance.
"What we hear from a lot of our customers is that they have maxed out what they can do with a homogeneous cluster and we are now ready for another type of architecture to take over and take high-performance computing forward," said Andy Keane, general manager of Nvidia's GPU Computing business unit. "A lot of the questions we get are not how to get things faster by a factor of four, but how do I get to 10X or 100X because these are the scale of problems, whether it is weather problems or design problems," Keane added. "The way we talk about this is heterogeneous computing."The Tesla chip is designed in a series of arrays. First, engineers created the basic graphics processing core and then duplicated that core eight times. These eight cores are then arranged around shared memory and an interface unit. There are 30 of these arrays, allowing Tesla to have 240 cores. The clock speed of the Tesla 10 series GPU ranges from 1.33GHz to 1.5GHz. The company also increased the performance of new chips from 500 gigaflops-500 billion calculations per second-to 1 teraflop or 1 trillion calculations per second. Nvidia's engineers also increased the onboard memory from 1.5GB to 4GB. Nvidia has also designed the new Tesla to handle double-precision computing, which doubles its ability to process data in terms of speed and quantity. With its new QS22 blade based on its Cell processor, IBM also offers a machine capable of handling double-precision computing. While Nvidia is using graphics in HPC, the company said it believes that the industry is moving toward a heterogeneous model that mixes the capabilities of both CPUs and GPUs to handle complex tasks. With that in mind, the company is also offering its S1070 system, which uses four Tesla GPUs, has 16GB of memory and gives 4 teraflops of performance. The system also has two second-generation PCI Express cards that can connect the system to standard servers based on x86 processors. Hewlett-Packard, Sun Microsystems and Dell all offer systems compatible with the Tesla-based product. In order to ensure that developers and third-party software vendors begin developing applications for use with Tesla, Nvidia has also updated CUDA to work with a range of new operating systems, including 64-bit versions of Linux and Microsoft Windows XP, as well as Vista and the latest versions of the Mac OS. The newer version of CUDA also allows for developers to create applications that take advantage of the double-precision technology within the Tesla 10 series. By Nvidia's account, there have been 70,000 downloads of the first version of the CUDA compiler. The goal now, Keane said, is to get more institutions and universities to teach the CUDA language to a new generation of programmers. In much the same way, Microsoft and Intel are working to ensure that more developers are trained to write code that takes advantage of parallel computing and multicore processor technology. John Spooner, an analyst with Technology Business Research, said Nvidia and AMD are trying to address markets that are increasingly looking for more and more computing power. At the same time, the HPC field is changing with graphics playing a much larger role in how to achieve the computing performance that enterprises such as oil and gas companies are looking for. "The good part about what Nvidia is doing is that they, as well as AMD, will be ahead of the curve when it comes to designing graphics processors for high-performance computing," Spooner said. "I think the more difficult part for Nvidia is that CUDA is a proprietary language and that tends to be a hard sell. They do need a programming model for their products, but there are some clear benefits to having an open-source model."
"What we hear from a lot of our customers is that they have maxed out what they can do with a homogeneous cluster and we are now ready for another type of architecture to take over and take high-performance computing forward," said Andy Keane, general manager of Nvidia's GPU Computing business unit. "A lot of the questions we get are not how to get things faster by a factor of four, but how do I get to 10X or 100X because these are the scale of problems, whether it is weather problems or design problems," Keane added. "The way we talk about this is heterogeneous computing."
Nvidia made several improvements on previous chips with its Tesla 10 processor, which contains 1.4 billion transistors and is manufactured on a 55-nanometer process. This Tesla processor is based on new microarchitecture that allowed the company to place 240 cores on a single die, nearly double the amount of the older Tesla 8 chip.