How IBM Is Changing Commercial High-Performance Computing

eWEEK HIGH-PERFORMANCE COMPUTING ANALYSIS: Big Blue, long a leader in creating some of the industry’s fastest and most powerful supercomputers, is developing new innovations that will make world-class HPC more accessible, effective and affordable than ever before.

IBM.sign2020

That computing technologies are constantly evolving is indisputable, but an associated issue that is seldom discussed is that the terms we use to describe technology solutions are largely calcified. Personal computers (PCs) once denoted cumbersome desktop products that almost entirely ran on Intel silicon, ran some version of Microsoft Windows and utilized packaged software applications.

Comparing those form factors to what is available today would be akin to setting a pterodactyl and hummingbird side by side. But consider also that modern personal computing devices span a wide variety of form factors, including smartphones, handheld devices, tablets, notebooks, two-in-ones, mobile workstations and a plethora of desktop systems that are massively more powerful than old-school PCs.

Along with Windows users have a wide range of operating systems (Windows, Mac OS, Chrome, Android, iOS and Linux variants) and user interfaces from which to choose. Most importantly, these devices can utilize tens of thousands of applications, millions of apps and tools, and countless cloud services and solutions that provide massive value to organizations and consumers alike.

The same can be said about nearly any IT platform but is particularly striking when it comes to the supercomputing and high-performance computing (HPC), especially solutions developed by IBM. Why so? IBM, long a leader in creating some of the industry’s fastest and most powerful supercomputers, is developing new innovations that will make world-class HPC more accessible, effective and affordable than ever before. Let’s consider that more closely.

The Growing Benefits and Challenges of World-Class Performance

When supercomputers initially emerged more than half a century ago (the 1960 UNIVAC and 1961 IBM 7030 are considered to be among the first), systems utilized unique and proprietary technologies and were usually developed for specific research projects and purposes.

That model continued for a quarter century until the emergence of massively parallel computing grid and cluster designs. In 1993, Top500.org began publishing biannual updates listing the world’s 500 fastest supercomputers according to their LINpack (floating point) benchmark performance.

Over time, supercomputers depending on proprietary, custom-built processors were largely supplanted by systems leveraging off-the-shelf CPUs and other components. In 2008, the IBM-designed “Roadrunner” (a hybrid system based on IBM PowerXCell 8i and AMD Opteron CPUs built for the Department of Energy’s Los Alamos Lab) became the first supercomputer to deliver more than a petaflop of compute performance. Hybrid or heterogeneous systems became increasingly common with systems, such as the IBM-built Summit, Sierra and Lassen systems, which harness IBM POWER9 CPUs and NVIDIA GPUs.

These and other Top500-ranked systems have proved invaluable for providing insights into numerous areas of scientific and commercial research that would otherwise have been hugely difficult or impossible to pursue. However, along with the substantial design and deployment investments these systems require, world-class supercomputers also consume massive amounts of electrical power.

The Supercomputer Fugaku at the RIKEN Center for Computational Science in Japan which leads the new Top500 list released this week requires more than 28 megawatts of electricity to run. Supercomputer Fugaku’s developers (Fujitsu and RIKEN) deserve credit for implementing a highly power efficient design (the system is also ranked No. 9 on the latest Green500 list of leading energy efficient supercomputers) but it still consumes enough electricity to power nearly 20,000 homes.

At a time when climate change is increasingly likely to impact and strain traditional energy sources, including hydroelectric facilities, the tradeoffs required to support increasingly powerful and power-hungry supercomputers offer much food for thought.

How IBM Is Changing Commercial HPC

What is IBM doing to address these and related issues? Following the new Top500.org list announcement, the company stated that it believes “the future of high-performance computing will require holistic systems that bring to bear the full power of hardware, software, networking, and additional tools like AI and Quantum to solve some of the world’s most pressing challenges. As client needs continue to evolve, AI, energy efficiency, and overall time to results are elements that are just as important, if not more so, than processing speed.”

Holistic system design, AI, energy efficiency and time to results are anything but new focus points for IBM. The Summit and Sierra supercomputers that ranked No. 1 and 2 on the Top500 list from November 2018 to November 2019 (the only time a single vendor has achieved concurrent Top500 leadership) were also listed in the Top 10 of the Green500 lists during that time. Both Summit and Sierra also support sophisticated AI-based data analysis enabled by NVIDIA Volta GV100 GPUs.

Working Smarter, Not Harder

IBM is also working to bring its supercomputing innovations to broader markets and commercial customers. In January, the company introduced the new Power System IC922, a purpose-built inference server designed to put AI models to work and help unlock business insights. In AI nomenclature, “inference” is the step beyond machine learning and neural network training, where an optimized system can apply AI to real world tasks and problems.

IBM’s Power IC922 can support up to six NVIDIA T4 Tensor Core GPU Accelerators, and the system is also modular and scalable. These features allow IBM clients to flexibly leverage whatever levels of inference acceleration and system performance that best suit their needs. Given its compact size, IBM’s Power IC922 can support AI inference wherever information is located–in a central on-premises facility or in a distributed data center.

IBM is also actively developing complementary new technologies that will have a major impact on commercial HPC and supercomputing. For example, engineers in the company’s High-Speed Bus Signal Integrity (HSB-SI) organization recently discussed the results of implementing IBM Bayesian optimization software, a machine learning tool developed by IBM Research, to reduce the number of simulations required to reach the optimal configuration for chip-to-chip communication.

The traditional “brute force” processes used to analyze chip-to-chip design channels are both engineering and simulation intensive and can take up to several days to arrive at an optimal combination. With this technology, the HSB-SI engineers were able to dramatically cut the amount of time required, achieve the same results and use fewer resources to get there.

Specifically, a 10-core system with IBM Bayesian optimization software reduced the job from nearly eight days of computing time to a mere 80 minutes. The team also experimented with a job that required results to be delivered in 100 minutes. A Bayesian optimization-enabled system with nine cores was able to complete that task, while utilizing brute force techniques required a system with 1,126 cores.

In other words, IBM Bayesian Optimization appears to qualify as a classic case of “working smarter, not harder.” Fundamentally, though, the fastest simulation is the one you don’t have to run.

Final Analysis

Like Roadrunner and many other IBM supercomputers, the Summit and Sierra systems had a great run at the peak of Top500.org rankings. But as I discussed in a previous Pund-IT Review, supercomputing leadership is a transitory occupation, at best. Computing innovation is a constant, and never resides in a single organization or country for very long.

But as I noted before, what we mean when we talk about supercomputing and supercomputers has also changed substantially over the past six decades. To its credit, Top500.org has incorporated other measurements of the systems it analyzes, including the Green500 and HPCG500 lists. But that offers further evidence that LINpack benchmarks of floating-point performance are merely one of many measurements that define contemporary supercomputing and HPC leadership.

A final issue worth considering is whether and how well vendors are delivering the HPC innovations they develop to customers other than government research labs and deep pocketed enterprises. By that measurement, it is difficult to find a better example of vendor success than IBM. The company was the first to develop hybrid supercomputing solutions which led directly to the DOE’s current IBM POWER9/NVIDIA-based and AI-enabled Summit, Sierra and Lassen systems.

With the introduction of its new hybrid Power IC922 inference servers, IBM is clearly offering a wealth of lessons learned and wisdom earned in supercomputing to broader commercial HPC markets. Similarly, I suspect that we’ll be hearing more about the new IBM Bayesian optimization technology in the months and years ahead. With these and other efforts and solutions, IBM is showing that “Let’s put smart to work” is far more than a mere marketing homily.

Charles King is a principal analyst at PUND-IT and a regular contributor to eWEEK.  © 2019 Pund-IT, Inc. All rights reserved.