LinkedIn, well known inside the development community for its innovation, is probably most famous for its development of Kafka, an open-source stream processing platform that provides a unified, high-throughput, low-latency platform for handling real-time data feeds.
Now the social network for professionals has done it again. On Aug. 28 it released to the open source community something called Cruise Control, a general-purpose system that continually monitors server clusters and automatically adjusts the resources allocated to them to meet pre-defined performance goals.
Cruise Control and Kafka work hand in hand. In essence, as Kafka users specify performance goals, Cruise Control monitors server clusters for violations of these goals, analyzes the existing workload on the cluster and automatically executes administrative operations to satisfy those goals. Like most new-gen IT, it’s all about speed and automation.
You can view a video here about how Cruise Control works.
It’s a fact that Apache Kafka’s popularity has grown substantially during the past few years. In fact, LinkedIn’s deployment itself, consisting of more than 1,800 servers, recently surpassed a staggering 2 trillion messages per day.
While Kafka has proven to be stable, there are still operational challenges when running Kafka at such a scale, LinkedIn developer Jiangjie Qin wrote in a blogpost. Servers fail on a daily basis, which results in unbalanced workloads on clusters. As a result, SREs expend significant time and effort to reassign partitions in order to restore balance to Kafka clusters, Qin said.
“Intelligent automation is critical under these circumstances, which is why we developed Cruise Control,” Qin wrote.
To obtain more details on Cruise Control, check Qin’s blog and check this reference.