Nvidia is enhancing its CUDA platform to make it easier for developers to program for systems that feature GPU accelerators.
The most significant enhancement in CUDA 6 is the support for unified memory, enabling developers who are writing code to no longer be concerned about where the data is when trying to leverage the graphics capabilities, according to Nvidia officials. Instead, the memory-management capabilities within CUDA will decide whether the data should be in the CPU or the GPU.
This “dramatically simplifies” the work programmers have to do when creating code for GPU-accelerated systems, according to Sumit Gupta, general manager of Tesla Accelerated Computing products at Nvidia.
“This is something developers have been asking for years,” Gupta told eWEEK.
Before, data in the CPU had to be moved to the GPU, where it was executed up and then moved back to the CPU. Data would need to be copied, creating multiple pools of data and forcing developers to have to include coding that specified where the data went. That memory-management job, which until now sat with the programmer, can now be done by the unified management capabilities within CUDA, Gupta said.
Nvidia’s unified memory announcement Nov. 14 comes the same week that Advanced Micro Devices officials announced similar capabilities within its accelerated processing units (APUs), which are AMD chips that integrate the GPU with the CPU on the same piece of silicon. In announcing the impending shipment of the company’s low-power “Kaveri” APU, officials said the GPU and CPU on the chip will share the same memory via AMD’s HUMA specification, enabling developers to create applications without having to worry about whether the code will leverage CPU or GPU memory. That decision is made within the APU.
The shared memory in the APU is a key part of AMD’s growing heterogeneous computing efforts.
The announcement also comes just before the kickoff for the SC ’13 supercomputing show in Denver, which begins Nov. 17. The show is an important one for Nvidia, especially given the growth in the use of GPU accelerators in supercomputers in recent years to improve the performance of these systems—particularly when handling highly parallel workloads—without increasing the power consumption.
Both Nvidia and AMD have been promoting the use of their GPUs for computing workloads, while Intel is promoting its x86-based Xeon Phi coprocessors, which offer as many as 60 cores, as an alternative. Fifty-four of the supercomputers on the most recent Top500 list of the world’s fastest systems—released in June—use the technologies, including 39 that use GPU accelerators from Nvidia, 11 using Xeon Phis and three using AMD’s ATI Radeon GPU accelerators.
The newest list will be released at the show in Denver.
Gupta said the growing number of supercomputers with GPU accelerators is an indication of the strong adoption of graphics technologies in computing, and has helped Nvidia rapidly expand its reach in the industry.
“We’re not just doing graphics, we’re doing computing,” he said.
Along with the unified memory support, Nvidia also has improved libraries within CUDA 6 that can speed up applications on GPUs by as much as eight times, the company said. In addition, the redesigned BLAS and FFT GPU libraries can automatically scale performance across up to eight GPUs in a single node, which means delivering more than nine teraflops of performance per node and supporting large workloads up to 512GB.