It was on June 2, 1897, when author Mark Twain sent a cable from London to the U.S. press after seeing his obituary had been published in the New York Journal. It consisted of the now-famous line: “The reports of my death are greatly exaggerated.”
There is a networking parallel. If the network protocol InfiniBand were a person, it would be Mark Twain, because people have been predicting its death for years. I recall about 15 years ago, startup Force10 (acquired by Dell in 2015) proclaimed its new low-latency Ethernet switch would be the death knell of InfiniBand. Hardly.
Years later, this niche market is still alive and kicking. According to IDC, in 2019, the market for InfiniBand was a little over $200 million. While this is just a fraction of the overall Ethernet switch market, it’s certainly not dead. Given many of the trends in the data center today, I’m predicting InfiniBand will grow in the high teens through 2025.
To get a better understanding of what has kept InfiniBand relevant and why it will continue, I recently interviewed Gilad Shainer, senior vice president of InfiniBand Networking for NVIDIA (pictured), which recently acquired InfiniBand provider Mellanox.
Zeus: It seems like as long as there has been networking, there's been some speculation that Ethernet would kill off InfiniBand. Are you seeing a slowdown in interest in InfiniBand?
Gilad: The heart of a data center is the network that connects all the compute and storage elements together. In order to have these elements work together and form what we call a supercomputer—for research, cloud workloads or deep learning—the network must be highly efficient and extremely fast. InfiniBand is an industry standard technology that was (and continues to be) developed with the vision of forming a highly scalable, pure software-defined network (SDN).
Back in 2003, it connected one of the top three supercomputers in the world. According to the June 2020 TOP500 supercomputing list, InfiniBand now connects seven of the top 10 supercomputers in the world. InfiniBand has become the de facto standard for high-performance computing systems, is strongly adopted for deep-learning infrastructures, and is increasingly being used for hyperscale cloud data centers, such as Microsoft Azure. The performance, scalability, and efficiency advantages of InfiniBand continue to drive the growing and strong adoption of it, as it is the best technology for compute and data intensive applications.
We are not witnessing any slowdown for InfiniBand; in fact, we see just the opposite. We see customers who’ve previously used Ethernet or other network technologies now moving to InfiniBand to interconnect their new data centers and utilize InfiniBand’s strengths to enable faster analysis of data, which leads to accelerated time to market and more efficient usage of their IT spending.
Zeus: Where are the advantages to InfiniBand?
Gilad: InfiniBand provides a number of key advantages. It is a full-transport offload network, which means that all the network operations are managed by the network and not by the CPU. It is the most efficient network protocol, which means the ability to transport more data with less overhead. InfiniBand also has much lower latency than Ethernet and, most importantly, it incorporates processing engines inside the network that accelerate data processing for deep learning and high-performance computing. These are key technology advantages for any compute- and data-intensive application. This is why InfiniBand has become the accepted standard for high-performance, scientific and product simulations.
Zeus: Can you give me some examples of industries that have adopted InfiniBand?
Gilad: Most, if not all, automotive and airplane manufacturers use InfiniBand as part of their designs. Many bioscience companies use InfiniBand for their research networks, and we have been seeing a significant rise in their activities as part of the global fight against COVID-19. Oil and gas companies such as ENI, Total and BP utilize InfiniBand for exploration analysis and seismic modeling.
InfiniBand also accelerates many of the world’s leading government and research centers, such as Oak Ridge National Laboratory, Lawrence Livermore National Laboratory, Jülich Supercomputing Centre, NASA, National Institutes of Health and many others.
InfiniBand is also becoming the preferred connectivity technology for deep-learning systems. It is an integral part of NVIDIA’s DGX 100 and SuperPOD platforms, which is being used by Continental and AIST (Japan’s artificial intelligence cloud infrastructure). Microsoft Azure has adopted InfiniBand to accelerate the high-performance Azure cloud instances, and we have seen increased adoption in other cloud hyperscale platforms. Moreover, embedded platform companies have adopted InfiniBand due to its efficient and cost-performance advantages. An example is Cadence, where InfiniBand is used for their silicon-emulation Palladium systems.
Zeus: Ethernet and InfiniBand seem to have similar speeds. Given the popularity of Ethernet, why hasn't it taken over?
Gilad: InfiniBand and Ethernet share similar physical network technologies, which really means similar serializer/deserializer (SerDes) elements that convert data between serial interfaces and parallel interfaces in both directions. Today, both InfiniBand and Ethernet use the same 50Gb/s SerDes technology, and therefore have the same network speed. InfiniBand typically packs four SerDes into a network adapter port or a switch port, yielding HDR 200Gb/s speed (the InfiniBand specification allows to pack up to 12 SerDes together). In Ethernet, we see the same four 50Gb/s SerDes adapter port configuration (200G ports), and the Ethernet specification allows to pack 8 of these SerDes for an aggregation level switch port (that is, for switch-to-switch communication only), yielding 400Gb/s. But it is the same data throughput per SerDes or per communication lane. This is not new; InfiniBand and Ethernet have shared the same data throughput for many years.
Thanks to its pure software-defined network advantages, InfiniBand has repeatedly been the first to market with end-to-end deployments of the new network speed, which is really the important part. InfiniBand was first at 100G, first at 200G and will probably be first at 400G. Furthermore, you can connect multiple switch ports together in InfiniBand to achieve much higher throughput between the switches, and this can’t be done with Ethernet. It is also important to mention that the higher network efficiency, 3X better latency and efficient network-based data processing—the advantages of InfiniBand over any other network—are no less significant for data center connectivity. Both are useful, but the bottom line is that Ethernet has not taken over.
Zeus: InfiniBand also has a number of other features that make it superior in demanding environments. Can you go over some of those?
Gilad: InfiniBand technology is based on four main fundamentals:
The first fundamental is the design of a very smart endpoint. An endpoint that can execute and manage all of the network functions (unlike Ethernet) and therefore increase the CPU or GPU time that can be dedicated for the real applications. Since the endpoint is located near CPU/GPU memory, it can also manage memory operations in a very effective and efficient way; for example, via RDMA or GPUDirect RDMA.
The second fundamental is a switch network that is designed for scale. It is a pure software-defined network (SDN). InfiniBand switches do not require an embedded server within every switch appliance for managing the switch and running its operating system (as needed in the case of Ethernet switches). This makes InfiniBand a leading cost-performance network fabric compared to Ethernet or any other proprietary network. It also enables unique technology innovations such as In-Network Computing, for performing data calculations on the data as it is being transferred in the network. An important example is the Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)™ technology, which has demonstrated great performance improvements for scientific and deep learning application frameworks.
Centralized management is the third fundamental. One can manage, control and operate the InfiniBand network from a single place. One can design and build any sort of network topology and customize and optimize the data center network for its target applications. There is no need to create multiple and different switch boxes for the different parts of the network, and there is no need to deal with so many complex network algorithms. InfiniBand was created to improve performance on one side and reduce OPEX on the other side.
Last but not least, InfiniBand is a standard technology ensuring backward and forward compatibility and is open source with open APIs.
Zeus: Anything else our readers need to know?
Gilad: Our future depends on how quickly we can analyze the data we collect and how fast we can solve complex problems. These can be things like finding cures for diseases, simulating and predicting storms, designing safer cars, finding better energy sources and improving our homeland security. Artificial Intelligence and high-performance simulations require the fastest and lowest-latency network and the ability to pre-process data before it goes to the GPU or CPU. This mandates a network that is extremely reliable and resilient. These are the characteristics of InfiniBand. InfiniBand is a lossless fabric, which means it does not drop packets like other networks do. InfiniBand is the network that enables the next generation of data-center infrastructure and applications.
It’s important to note that InfiniBand and Ethernet can’t be used together. InfiniBand-connected data centers can be easily connected to external Ethernet networks via InfiniBand-to-Ethernet low-latency gateways. InfiniBand also offers long-reach connectivity from a few tens of miles to thousands of miles, enabling to connect remote data centers together.
Along with all the above advantages, InfiniBand also brings cost-performance advantages, is simpler to deploy, and is simpler to scale. InfiniBand delivers today one generation ahead and grants a competitive advantage to the people using it.
Zeus Kerravala is an eWEEK regular contributor and the founder and principal analyst with ZK Research. He spent 10 years at Yankee Group and prior to that held a number of corporate IT positions.