Google Shares Details of Its Software-Defined Load Balancer

The Maglev software-defined load balancer, which runs on commodity Linux servers, has been critical to Google Cloud Platform for eight years, company says.

Google Load Balancer 2

As it's already done with other areas of its massive datacenter infrastructure, Google this week gave enterprises a peek at Maglev, the software-defined network load balancer the company has been using since 2008 to handle traffic to Google services.

Maglev, like most of Google's networking systems, was built internally. But unlike Jupiter, the custom network fabric connecting Google's data centers, Maglev runs on commodity Linux servers and does not require any specialized rack deployment, Google said in a blog post describing the technology.

According to Google, Maglev uses an approach known as Equal Cost Multipath (ECMP) to distribute network packets evenly to all Maglev machines in a cluster. Each Maglev system then uses hashing and connection tracking techniques to ensure that all packets it receives are forwarded to the right destination. If a Maglev system becomes unavailable, other Maglev units in the cluster are designed to carry the extra load.

Maglev systems are optimized for packet processing and a single system can saturate a 10Gbps link with small packets, Google said. The ECMP design optimizes the load-balancing capacity of Google's networks, the company's technical lead Daniel Eisenbud and Paul Newson, Google's developer advocate, said in the blog post.

The active-passive configuration that hardware load balancers typically use is wasteful because at least half the available capacity is always used for failover purposes.

"All Maglevs in a cluster are active, performing useful work," the two Google engineers said. "This N+1 redundancy is more cost effective than the active-passive configuration of traditional hardware load balancers, because fewer resources are intentionally sitting idle at all times."

Maglev leverages Borg, a Google cluster management technology that allows engineers to move service workloads between clusters in an efficient manner and as needed. It enables better utilization of unused capacity in the same way that Google's cloud platform customers have the flexibility to move workloads between regions and zones, Eisenbud and Newson said.

"Recently, the industry has been moving toward Network Function Virtualization (NFV), providing network functionality using ordinary servers," the engineers said. Maglev is proof [of] how NFV can be leveraged to enable easier addition and removal of networking capacity. It also shows how NFV approaches can be used to enable additional networking services without the need for new and custom hardware.

Network clusters based on Maglev are already capable of handling traffic at Google scale, but have enough headroom to handle at least another million requests per second without the need for additional resources, the engineers claimed.

Google periodically releases information on its technology infrastructure as part of a bid to show other enterprise IT organizations how to optimize hardware, software and network infrastructure for massive workloads.

Last year, for instance, the company offered a similar glimpse of its Jupiter network fabric, which it has described as being able to deliver more than 1 petabit per second bandwidth. Like Maglev, Google built Jupiter entirely in house. It represents more than 10 years worth of engineering investment by the company.

Jaikumar Vijayan

Jaikumar Vijayan

Vijayan is an award-winning independent journalist and tech content creation specialist covering data security and privacy, business intelligence, big data and data analytics.