Cray officials are unveiling the company’s first exascale-capable supercomputer, a system named Shasta that is designed to give organizations choices in compute and networking technologies and a single system to run such increasingly complex workloads as artificial intelligence, analytics, modeling and simulations.
The supercomputer is aimed at simplifying computing for modern workloads that typically run on heterogeneous cluster systems, which Cray officials argue are becoming too complex. There is a growing demand for single systems to run disparate workloads and workflows at the same time, improving manageability, removing performance bottlenecks and enabling organizations to run these workloads at scale.
With Shasta, users can choose the infrastructure that best fits their needs. They’re given the option of mixing and matching chip architectures—such as x86 processors from Intel and Advanced Micro Devices, Arm-based systems-on-a-chip (SoCs), GPUs from Nvidia and field-programmable gate arrays (FPGAs), which are developed by Intel and Xilinx—in the same system with interconnects, like Intel’s Omni-Path and Mellanox’s InfiniBand offerings. Another interconnect option will be Cray’s new Slingshot high-speed technology, which officials said will have up to five times the bandwidth per node of traditional interconnects and is designed for data-centric computing.
Slingshot, targeting high-performance computing (HPC) and AI workloads, will be compatible with Ethernet and will include adaptive routing, congestion control and high-level quality-of-service capabilities. The new interconnect enables Cray to build networks of more than 250,000 endpoints with a diameter of only three network hops.
“Slingshot’s very low network diameter allows extremely responsive adaptive routing; each switch has a good view of the overall state of the network, so it can make fast, well-informed decisions about optimal paths to take to avoid temporary congestion,” Steve Scott, CTO and senior vice president at Cray, wrote in a blog post. “This allows us to sustain well north of 90% utilization, even at large scale, for well-behaved workloads.”
Shasta, which will be commercially available in the fourth quarter of 2019, will be on display during the SC18 supercomputing show starting Nov. 11 in Dallas. Cray officials expect it to be 10 to 100 times faster than supercomputers today.
The Shasta systems will come in two models, including a 19-inch air- or liquid-cooled system that runs in a standard data center rack. There also will be a high-density, liquid-cooled rack that will be able to hold 64 compute blades with multiple processers per blade. Both will be able to scale to more than 100 cabinets.
The Shasta systems are part of a larger push worldwide to develop systems that can run workloads at exascale levels. The combination of increasingly complex and large workloads like analytics, AI, simulation and machine learning and the slowing of processor improvements under Moore’s Law has driven demand for new computing architectures that can bring higher and faster performance than current supercomputers. Exascale-class computers will be capable of performing at least one exaFLOPS, equal to a billion billion calculations per second. The first petascale computer was introduced in 2008, and exascale computing would provide a thousand-fold increase over that 10-year-old computer architecture.
The push for exascale computing is fueling a global competition among nations—primarily between the United States and China—to become the dominant player in the space, which would give the leader advantages in everything from military and scientific research and development to business innovation. The European Union also has its own exascale initiatives underway.
China this month announced the third of its prototype exascale systems, known as Shugaung, that will be built using homegrown x86 processors. Chinese chip maker Hygon is making x86 chips based closely on AMD’s Epyc server processors and its “Zen” core microarchitecture.
Exascale plans in the United States include the Aurora system to be launched at the Argonne National Lab in 2021 and Frontier, which will come online at the Oak Ridge National Lab in 2022.
Along with announcing Shasta, Cray officials also said that the National Energy Research Scientific Computing (NERSC) center at the Lawrence Berkeley National Lab will use a Shasta system as the basis for its “Perlmutter” supercomputer. The Perlmutter contract is valued at $146 million and the system will go online in 2020, featuring a Shasta system and Cray’s ClusterStor storage technology.
It’s the latest in a series of announcements around fast U.S.-based systems. In June, the Summit system based on IBM’s Power architecture at the Oak Ridge lab was named the fastest supercomputer in the world in the Top500 list, and this week Lawrence Livermore Lab unveiled Sierra, a system featuring IBM’s Power9 chips and Nvidia’s V100 GPUs and what is now the world’s third-fastest supercomputer. The next edition of the Top500 list will be released next week at the SC18 show.