The Department of Energy is handing out $425 million to tech vendors for supercomputing projects that will help continue the push toward exascale computing.
The DoE awarded $325 million to IBM, Nvidia and Mellanox Technologies to build two supercomputers that will be five to seven times as powerful as the current systems when they are fully operational sometime in 2017. Another $100 million was given to Advanced Micro Devices, Intel, IBM, Cray and Nvidia as part of a program called FastFoward 2 that is aimed at pushing forward the development of extreme scale computing technologies.
Such research into high-performance computing (HPC) technologies is important to everything from basic science and national defense to energy research and environmental studies, according to U.S. Energy Secretary Ernest Moniz, who announced the awards Nov. 14.
“High-performance computing is an essential component of the science and technology portfolio required to maintain U.S. competitiveness and ensure our economic and national security,” Moniz said in a statement.
The federal government continues to fund HPC research as the world moves closer to exascale computing, with systems 20 to 40 times faster than current supercomputers. Industry officials are aiming to reach exascale computing within the next six to eight years, according to Sumit Gupta, general manager of Tesla accelerated computing at Nvidia.
The two new supercomputers will be housed at the DoE’s Oak Ridge and Lawrence Livermore national laboratories. Oak Ridge currently runs the Titan supercomputer, a Cray system powered by AMD chips and Nvidia GPU accelerators that two years ago was the fastest supercomputer in the world. The new system at Oak Ridge—named Summit—will offer five to 10 times the performance of Titan’s 27 petaflops, which would easily put it at the top of the current list, which is occupied by China’s Tianhe-2, at 55 petaflops. Lawrence Livermore currently runs IBM’s Sequoia supercomputer, and the new system—Sierra—will be at least seven times faster, according to the DoE.
IBM officials said both new systems—which will come online sometime in 2017 or 2018—will have peak performances of more than 100 petaflops. They will take advantage of IBM’s work on what officials call “data centric” computing, where processing power is brought to where data resides, rather than having to constantly send it between the chip and storage, enabling faster speeds for analytics. Such speed is increasingly important in the era of big data.
The systems also will leverage technologies from the OpenPower Foundation, which has opened up IBM’s Power processors to incorporate innovation from third parties. The supercomputers will run on IBM’s Power9 chips and will take advantage of new GPU acceleration technology from Nvidia. HPC systems are increasingly adopting accelerators—from GPUs from Nvidia and AMD to x86 Xeon Phi co-processors from Intel—to help increase performance while keeping a rein on power consumption.
The Summit and Sierra supercomputers will leverage Nvidia’s Volta GPU technology, which actually is two generations away from the current Kepler GPU accelerator. The next generation will be Pascal in 2016, which will then be followed by Volta, according to Nvidia’s Gupta.
IBM, Nvidia Get $325 Million to Build 2 Supercomputers
A key technology that will first appear with Pascal and continue onto Volta is NVLink, a high-speed interconnect technology developed with IBM and first introduced in March that will enable CPUs and GPUs to exchange data five to 12 times faster than they can today. Currently, the speed is at about 16 Gb/s. With NVLink and its fatter pipe, that will jump to 80 Gb/s to 200 Gb/s, according to Gupta. The Volta GPUs will also include 3D Stacked Memory, which will increase throughput via four times higher bandwidth and three times larger capacity than current graphics chips, he said.
The technologies will enable the companies to create supercomputers that are smaller than their predecessors, yet offer significantly more performance. Summit will be about a fifth of the size of Titan, and consume about 10 percent more power, Gupta said.
The $100 million for FastForward 2 will be used to help create supercomputers that increase performance but remain energy- and cost-efficient. According to AMD CTO Mark Papermaster, the chip maker’s focus will be on memory interfaces and the company’s work around its accelerated-processing units (APUs) and Heterogeneous Systems Architecture (HSA), which makes it easier to move workloads between CPUs and GPUs. Eventually the innovations from the research will make its way into future AMD products, Papermaster said in a post on the company blog.
The research by AMD, IBM, Cray, Nvidia and Intel will be important to the exascale effort.
“This research aims to deliver those huge increases in performance—without significant increases in energy consumption—to enable advances in diverse fields ranging from medical science to astrophysics and climate modeling,” he wrote. “These could arrive as prototypes over the next several years, with full production units early in the next decade.”