Nvidia Wants to Be the Inference Engine for AI Applications

The chip maker rolled out two new Pascal GPUs designed to compete with Intel processors in the area of deep learning known as "inference."


Nvidia officials are looking to grab a larger share of the ever-expanding artificial intelligence space.

At the company's GPU Technology Conference in Beijing this week, Nvidia CEO Jen-Hsun Huang unveiled two new GPUs that are specifically designed for the part of the deep learning process called "inference," an area where the work is usually processed on CPUs from Intel. The new GPUs—the Tesla P4 and P40 (pictured)—are part of a widening push by Nvidia in the artificial intelligence (AI) arena and the latest point of competition between the GPU maker and Intel in the emerging market.

AI refers to the creation of systems that can collect input (such as voice commands or images from the environments around them), process the data instantly and then react accordingly. It can be seen in such technologies as Siri on Apple iPhones and movie recommendation programs on Netflix, and will play an increasingly important role in other spaces, such as self-driving cars.

"AI is everywhere," Marc Hamilton, vice president of solutions architecture and engineering at Nvidia, told eWEEK. "We believe it's the most important computing technology in the industry right now."

Nvidia officials over the past couple of years have made AI and deep learning crucial parts of the vendor's strategy going forward, and Intel also is aggressively pursuing the emerging market. Central to the competition is the debate over whether GPUs or the x86 CPU architecture are best suited for the workloads that need to run in order for AI to fulfill its promise. The issue of deep learning is twofold.

There essentially are two parts of machine learning: training (where neural networks are taught such things as object identification) and inference (where they use this training to recognize and process unknown inputs, such as Siri understanding a user's question and then responding correctly). Neural networks used for training are large, and most training is done on Nvidia GPUs. The inference networks are smaller, and most of that work is done on CPUs from Intel.

However, Nvidia officials are arguing now that the Tesla P4 and P40 deliver greater performance and efficiency than Intel chips for inference applications. Both are based on the vendor's 16-nanometer Pascal architecture and offer specialized inference instructions based on 8-bit (INT8) operations. According to Nvidia officials, the new GPUs offer four times faster response than the previous GPU offerings launched less than a year ago and 45 times better response than CPUs.

The P4, due out in November, is smaller than the P40 and is designed for data centers where energy efficiency is key. The GPU, which starts at 50 watts of power, can fit into any server and low-power system and, according to Nvidia, is 40 times more efficient than CPUs when running production inferencing workloads. A single server powered by a P4 GPU can replace 13 CPU-only systems for video inferencing workloads and deliver more than eight times the savings in total cost of ownership, which includes server and power costs.

The larger and more powerful P40, which is scheduled for release in October, provides 47 tera-operations per second (TOPS) of inference performance, and a server with eight P40 GPU accelerators can replace more than 140 CPU-based servers, saving organizations more than $650,000 in server acquisition costs.

The two graphics cards follow the massive P100 that Nvidia officials introduced in April, which packs 150 billion transistors and was built for data center and cloud environments.

Intel is pushing ahead with its own AI initiatives. The chip maker in recent months bought Nervana Systems and its machine learning technologies, and officials have boasted of the company's commitment to such open machine learning frameworks as Caffe and Theano. In addition, at the Intel Developer Forum last month, Intel officials talked about the planned release of a multi-core Xeon Phi chip dubbed "Knights Mill" with enhanced variable precision and flexible high-capacity memory that is aimed at AI workloads. Knights Mill is due out next year.

Nvidia also introduced new software tools for inferencing workloads. TensorRT is a software library designed to make it faster and more efficient to taking neural networks that have been trained and optimizing them for reduced precision INT8 operation. In addition, the DeepStream software developer kit (SDK) is aimed at video streams. The software enables servers powered by Pascal GPUs to simultaneously decode and analyze up to 93 high-definition video streams in real time, much faster than the seven streams that can be run with dual CPUs, officials said. This capability will be important for such applications as self-driving cars, interactive robots and ad placement, they said.