Microsoft Enlists FPGAs to Catapult Cloud, Data Center Performance

The company will employ programmable hardware to push the boundaries of cloud-based workloads, save power and avert the twilight of Moore's Law.

Microsoft technology leap

Microsoft is giving Bing a big upgrade in 2015, based on a fundamentally different computing foundation compared with commodity servers.

Project Catapult, a collaboration between Microsoft Research and Bing, seeks to improve data center performance by up to 95 percent using an alternative to server processors called field-programmable gate arrays (FPGAs). The group's work is detailed in a paper, A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services, which they presented June 16 at the International Symposium on Computer Architecture (ISCA) in Minneapolis.

The project's aim is to head off the potential sunset of Moore's Law with a computing architecture that doesn't rely solely on shrinking transistors, a process that is expected to eventually run into the practical limits imposed by physics. (Moore's Law, named after Intel co-founder Gordon E. Moore, dictates that computing power doubles roughly every 18 months as chip-makers pack more transistors into their processors.)

Catapult's backers seek to accelerate cloud services while reducing costs as the performance curve of traditional server processors beings to plateau. Early tests have shown encouraging results, according to Derek Chiou, a Bing hardware architect. The test bed included 1,632 standard servers outfitted with Sandy Bridge Intel Xeon processors and FPGAs that resided on the PCIe bus. The high-end Stratix FPGAs, provided by Altera, in turn, were connected to one another with 10G-bit SAS cables.

The performance and efficiency gains were staggering, according to a blog post by Rob Knies, a Microsoft Research senior writer.

"The results were impressive: a 95 percent improvement in throughput at a latency comparable to a software-only solution," wrote Knies. "With an increase in power consumption and total per-server cost increase of less than 30 percent, the net results deliver substantial savings and efficiencies," he concluded.

"The factor of two throughput improvement demonstrated in the pilot means we can do the same amount of work with half the number of servers or double the amount of work with the same number of servers—or some mix of the two," said Chiou in a statement. He further revealed that based on the success of the pilot program, the tech will be deployed in one, customer-facing data center in early 2015 to augment Bing.

Not only did Catapult "run stably for long periods," it exhibited a level of fault tolerance that bodes well for large-scale cloud deployments. Knies added that the project contains a service that "quickly reconfigures the fabric after errors or machine failures."

Why not GPUs?

Interestingly, FPGAs won out against graphical processing units (GPUs), another method of boosting server processing power, particularly for parallelized workloads, due to two major roadblocks. "We decided not to incorporate GPUs because the current power requirements of high-end GPUs are too high for conventional data center servers, but also because it was unclear that some latency-sensitive ranking stages (such as feature extraction) would map well to GPUs," noted the study.

Microsoft's researchers concluded that "distributed reconfigurable fabrics are a viable path forward as increases in server performance level off, and will be crucial at the end of Moore's Law for continued cost and capability improvements."

Pedro Hernandez

Pedro Hernandez

Pedro Hernandez is a contributor to eWEEK and the IT Business Edge Network, the network for technology professionals. Previously, he served as a managing editor for the network of...