Nvidia is releasing the latest version of its CUDA toolkit that company officials said features some of the platform’s most significant improvements, including unified memory.
CUDA is designed to enable developers to create software for computing systems that leverage GPU accelerators. CUDA 6 not only includes support for unified memory, but also for Nvidia’s new Tegra K1 system-on-a-chip (SoC) for mobile and embedded devices.
Mark Harris, chief technologist for GPU computing software at Nvidia, called the support for unified memory “one of the most dramatic programming model improvements in the history of the CUDA platform.” In a typical PC or cluster node, the memory for the CPU and GPU are separated by a PCI-Express bus, and developers are forced to view the memories as distinct entities. Data shared between the CPU and GPU needs to be copied to both memories and copied between them, which Harris said adds complexity to CUDA programs.
“Unified Memory creates a pool of managed memory shared between the CPU and GPU, bridging the CPU-GPU divide,” Harris wrote in a post on the company blog. “Managed memory is accessible to both the CPU and GPU using a single pointer. The key is that the system automatically migrates data allocated in Unified Memory between host and device so that it looks like CPU memory to code running on the CPU, and like GPU memory to code running on the GPU.”
Now, developers no longer need to be concerned about where the data is when trying to leverage the graphics capabilities, according to Nvidia. The memory-management capabilities within CUDA will decide whether the data should be in the CPU or the GPU. It’s similar to the capabilities Advanced Micro Devices engineers are putting into the company’s accelerated processing units (APUs), which also feature graphics and central processing on the same chip. In upcoming APUs, the GPU and CPU will share the same memory via AMD’s HUMA specification.
Having CUDA 6 on the Tegra K1 SoC means that the platform is now on every Nvidia chip, Harris wrote. Introduced in January, the Tegra K1 includes 192 GPU cores and an ARM-based CPU featuring Nvidia’s 4-plus-1 power efficiency architecture, as well as a range of other capabilities.
“Parallel computing on every NVIDIA GPU has been a goal since the first release of CUDA,” Harris wrote. “CUDA 6 and the new Tegra K1 system on a chip (SoC) finally enable ‘CUDA Everywhere,’ with CUDA capability top to bottom from the smallest mobile processor to the most powerful Tesla K40 accelerator.”
The CUDA 6 toolkit can be downloaded now on CUDA Zone.