How NVIDIA Is Strengthening Its Position in Supercomputing

eWEEK NETWORKING ANALYSIS: The GPU market leader makes a number of announcements at International Supercomputing Conferences 2020.

NVIDIA.logo.black.green

This week the digital version of the International Supercomputing Conference (ISC 2020) kicked off. As part of the event, GPU (graphics processing unit) market leader NVIDIA made a number of announcements that strengthen its position as a key player in the supercomputing industry. The scientific computing aspects of supercomputing have changed over the years, and they now require GPU acceleration. This new era of supercomputing has expanded well past the traditional use cases of modeling and simulation workloads and now includes artificial intelligence, analytics, edge streaming, visualization and more.

GPU Acceleration Is Critical to Advancements in Science

These applications require the benefits that GPU acceleration brings to give scientists and researchers the speed required to accelerate the pace of work so we can solve some of the world’s biggest problems. This has never been more evident that the battle to find a cure for COVID-19. Scientific computing platforms are playing a virtual role in speeding up the research being done in this area. Some of the examples of how this is happening includes Oxford Nanapore, which was able to sequence the virus genome in just seven hours using accelerated computing. This would have taken weeks or even months with CPU only-based systems. AI is playing an important role in the fight against COVID-19 by powering robots to deliver supplies or quickly measure peoples body temperatures.

A100 GPU Available in PCIE Form Factor

NVIDIA made several important announcements to continue the progress it’s made in supercomputing. The first is an entirely new GPU. While NVIDIA has been pushing forward in becoming more of a systems company, the GPU is at the core of everything it does, and NVIDIA has the broadest portfolio in the industry.

Last month at its digital GPU Technology Conference (GTC), the company announced its A100 GPU with Ampere architecture. This was a huge leap in accelerated computing as this GPU provides about 20x the performance of its Volta. At GTC, the GPU was available in SXM4 form factor, which is used in its NVLink to combine multiple GPUs. At ISC2020, the company is announcing its available in PCIE version of A100. This has the performance characteristics but is now available in a 250W PCIE form factor.

The PCIE format opens the door to a wide range of third parties to use the chip. Because of this, NVIDIA is also announcing that its A100 is now available from many of its server partners including Cisco, Dell, Fujitsu, HPE, Lenovo and others. The PCIE configuration makes it ideal for a wide variety of server designs that go into standard racks. The new form factor is ideally suited when a single GPU can handle the workload in which the SXM designed can be used for multi-GPU systems.

Selene Breaks Supercomputing Barriers

To highlight the possibilities of combining A100 with a fast network, NVIDA announced Selene, its own industrial supercomputer, which it is claiming as one of the fastest and most energy-efficient systems in the U.S. The most recent Top 500 Supercomputer list has not been announced yet, so it will be interesting to see where it ranks. But it does break the 20 GF (gigaflop)-per-watt energy barrier. Selene is built on 2,240 A100 GPUs and connected by 494 Mellanox Quantum 200G Infiniband Switches, giving it a 56 TBS switch fabric. The supercomputer also contains 7 PB of all-flash storage.

It will be interesting to see what NVIDIA does with Selene. Typically, when it builds a system it becomes a reference design for its customers and partners, but this gives NVIDIA massive amounts of its own compute power that I expect it to use to fuel its own R&D. CEO Jensen Huang often talks about the importance of optimizing applications for data center-scale computing and that now the company has the platform to lead the way.

Apache Spark 3.0 Now Generally Available

Another ISC announcement is an update to its analytics software, Apache Spark, which is a widely deployed analytics platform used by over half a million data scientists. Apache Spark 3.0 is now generally available. The new version enables a single pipeline of analytics – from ingestion to data preparation to model training. In the past, these were done separately creating several data pipelines. With the 3.0 version there is just one, simplifying the analytics process. This leads to consolidation of infrastructure and simplification of workflows.

Mellanox and NVIDIA Technology Speeds Up Cyber Analytics

Lastly, the company announced its UFM Cyber-AI intelligence and analytics platform, which is the first product that combines Mellanox and NVIDIA capabilities. The platform uses network telemetry from all parts of the network, including adapters, switches, cables and other components. This data is used to build a model and baseline data center operations. This data can be used for a wide range of use cases, including finding threats. It can also predict when components will fail. Some supercomputing platforms have been hacked for bitcoin mining. The NVIDIA UFM Cyber-AI platform would spot that change in behavior immediately and could shut it down.

As the world continues to change and we focus more on societal and health issues, supercomputing platforms will be leaned on to help solve these problems. The announcements made by NVIDIA are important to the company to help maintain its leadership position. But more importantly, it helps scientists and researchers work faster and more efficiently closing the gap between hope and reality.

Zeus Kerravala is an eWEEK regular contributor and the founder and principal analyst with ZK Research. He spent 10 years at Yankee Group and prior to that held a number of corporate IT positions.