Fujitsu Technology May Reduce Switches in Supercomputers

Company researchers have created a new mesh topography and communications algorithm that could cut the number of switches in clusters by 40 percent.

data center

Fujitsu Laboratories researchers say they have found a way to reduce the number of network switches in cluster supercomputers by as much as 40 percent without impacting performance.

The company has created a new communications algorithm designed to more efficiently control transmission sequences that, when combined with a multilayer full mesh topography, helps reduce the number of "data collisions" and reduce the number of bottlenecks in the networks, officials said.

The result is that supercomputer clusters with the new topology use switch ports more efficiently than current designs and need 40 percent fewer switches to reach the same performance level, which would mean reduced costs for capital and operational expenses and greater energy efficiency.

Currently, such supercomputer clusters—which comprise multiple servers connected by high-performance networks—use what researchers call a three-layer "fat tree" network topology, where tiers are created based on the number and kinds of servers being connected, with redundancy of paths that connect the switches leading to fast network performance, according to Fujitsu. As an example, researchers said a supercomputer cluster with 6,000 servers would need 800 switches—each of which has 36 ports—to connect them.

Fat-tree topologies are designed to help reduce network congestion in large-scale compute environments.

The demand for higher performance in supercomputer clusters is growing as the workloads for these systems expand, according to the researchers. While they've been used for such jobs as designing cars, airplanes and mobile phones and scientific computing, they now are being used in such areas as drug discovery, medicine and weather analysis. These servers tend to be powered by multicore processors and hold many CPUs and general-purposes GPU accelerators.

Such GPU accelerators from Advanced Micro Devices and Nvidia, as well as x86-based coprocessors from Intel, are increasingly being used in high-performance computing (HPC) environments as a way of increasing the compute performance of the supercomputers and clusters while keeping the power consumption in check.

To keep balanced with compute, network performance also must increase, which tends to mean more switches, which in turn leads to higher costs in such areas as technology, power and space.

The mesh network topology and the new communications algorithm enable users to determine an optimized data-exchange process and then connect the cluster to fit that process, which would allow them to manage a large number of servers with fewer switches.

Fujitsu researchers described the multilayer full-mesh network topology as one where "switches for indirect connections are arrayed around the periphery of a full-mesh framework that connects all switches directly, and multiple full-mesh structures are connected to each other." Switch ports are used more efficiently, with the result being that, when compared with the traditional three-layer fat-tree topology, an entire layer of switches can be eliminated, they said.

The new communications algorithm means that, even though there are fewer switches and fewer data paths in the all-mesh topography than in the fat-tree design, the number of bottleneck-causing data collisions in these paths do not increase. This is important for all-to-all communications, where every server is sending data back and forth between other servers. The algorithm uses more efficient scheduling and mapping to ensure that data does not collide with other data on the same path, Fujitsu Laboratories officials said.

Fujitsu Laboratories officials said the plan now is to have an implementation of the technology out by March 2016. Researchers also will continue to investigate topologies that can be used for large-scale computing systems that wouldn't result in more switches being needed.

Researchers will give details about the new technology at the Summer United Workshops on Parallel, Distributed and Cooperative Processing 2014, which starts July 28 in Japan.