Nvidia Lays Out GPU Roadmap, Graphics Interconnect

At its GPU Technology Conference, Nvidia also unveils a developer kit featuring its Tegra K1 ARM-based SoC.

Nvidia this week unveiled its roadmap with the addition of the upcoming Pascal graphics family that will include a GPU interconnect that will speed up communications between the GPU and CPU in supercomputers.

The announcement was one of several that the company made during its annual GPU Technology Conference in San Jose, Calif. Along with Pascal, other announcements included the introduction of a developer's kit featuring Nvidia's Tegra K1 system-on-a-chip (SoC). Most the announcements were made by CEO Jen-Hsun Huang during his keynote address March 25.

Pascal will replace the current Maxwell GPU architecture, and is expected to launch in 2016, according to Nvidia. Company officials reportedly said the Pascal chips will be faster and smaller than their predecessors.

A key technology that will come with the chip will be NVLink, a GPU interconnect developed with IBM that will enable GPUs and CPUs to share data at more than five times—and as high as 12 times—the rate they can now. Currently, the speed is at about 16 Gb/s. With NVLink and its fatter pipe, that will jump to 80 Gb/s to 200 Gb/s, according to Nvidia officials.

Currently, GPUs are connected to x86-based chips via PCI Express (PCIe) interfaces, which slows the ability of the GPU to access CPU memory, and are four to five times slower than CPU memories. The NVLink interface will match the bandwidth of CPU memory systems, the company said.

"NVLink technology unlocks the GPU's full potential by dramatically improving data movement between the CPU and GPU, minimizing the time that the GPU has to wait for data to be processed," Brian Kelleher, senior vice president of GPU engineering at Nvidia, said in a statement.

The capability could be a boon for supercomputer makers, many of whom already are leveraging GPUs as accelerators to help increase the performance of the systems without driving up the power consumption. Such GPU accelerators—from both Nvidia and Advanced Micro Devices—as well as Intel's 86-based Xeon Phi coprocessor, were used in 53 of the world's fastest 500 supercomputers, according to the Top500 list released in November 2013. Thirty-eight of those systems use Nvidia's GPUs, and 13 Intel's Xeon Phis. Two used AMD's Radeon graphics technology.

Pascal also will feature stacked memory, a technology that takes multiple layers of DRAM components and integrates them vertically—in a stack—on the same package as the GPU.

"This lets GPUs get data from memory more quickly—boosting throughput and efficiency—allowing us to build more compact GPUs that put more power into smaller devices," Sumit Gupta, general manager of Nvidia's Tesla Accelerated Computing business unit, said in a post on the company blog. "The result: several times greater bandwidth, more than twice the memory capacity and quadrupled energy efficiency."

Combined, NVLink and stacked memory will be a benefit for developers, according to Denis Foley, senior director in Nvidia's GPU Architecture group.

"NVLink and stacked memory enable acceleration of a whole new class of applications," Foley said in a post on the company blog. "The large increase in GPU memory size and bandwidth provided by stacked memory will enable GPU applications to access a much larger working set of data at higher bandwidth, improving efficiency and computational throughput, and reducing the frequency of off-GPU transfers. Crafting and optimizing applications that can exploit the massive GPU memory bandwidth as well as the CPU [to] GPU and GPU [to] GPU bandwidth provided by NVLink will allow you to take the next steps towards exascale computing."

Nvidia officials also unveiled the Jetson TK1 Developer Kit, an embedded platform designed to give developers the hardware tools to create applications that can be found in a range of devices, from smartphones to game consoles. It also gives system developers a way to get hands-on experience with Nvidia's technology, in particular its Tegra K1 mobile processor, a 192-core ARM-based chip built on the company's Kepler architecture.

Nvidia rolled out the Tegra K1 in January at the 2014 Consumer Electronics Show.

The Jetson TK1 Developer Kit supports the CUDA 6.0 developer suite and comes with 2GB of memory and I/O connectors for USB 3.0, HDMI 1.4, Gigabit Ethernet, audio, SATA, miniPCIe and an SD card slot.

"Jetson TK1 fast tracks embedded computing into a future where machines interact and adapt to their environments in real time," Ian Buck, vice president of accelerated computing at Nvidia, said in a statement. "This platform enables developers to fully harness computer vision in handheld devices, bringing supercomputing capabilities to low-power devices."

The developer kit is available now starting at $192.