Apache Kafka Lands on Microsoft's Cloud-Based Big Data Platform

Microsoft's big data customers can now use Apache Kafka to help power their IoT applications and other workloads that involve massive data streams.


After a yearlong beta period, Apache Kafka on Azure HDInsight is now ready to take on production workloads with Microsoft's rollout of the solution.

Kafka is an open-source stream processing platform that is becoming increasingly pivotal to monetizing the wealth of information generated by the internet of things (IoT) for many enterprises. Azure HDInsight is Microsoft's cloud-based big data platform.

By bringing them together and offering Kafka as a managed service on Azure, Microsoft claims it can help remove some of the barriers that are preventing organizations from seizing business opportunities in a hyperconnected world.

"HDInsight is a managed platform with a 99.9 percent SLA [service-level agreement] on open source workloads," wrote Raghav Mohan, a program manager at Microsoft Azure Big Data, in a Dec. 18 announcement. "With this addition, our enterprise customers no longer worry about managing Kafka clusters, as HDInsight manages and fixes the issues involved with running Kafka at an enterprise scale."

Naturally, Microsoft was among the first to use its own offering in a production setting.

At Microsoft, Apache Kafka on Azure HDInsight powers Siphon, a distributed system that the company uses to ingest, distribute and consume streaming data for subsequent processing and analytics. According to Microsoft, Siphon ingests more than a trillion messages each day and helps deliver some of the company's largest cloud services, including Office 365, Skype and Bing Ads.

Outside of Redmond, Wash., GE Healthcare is using Apache Kafka on Azure HDInsight to power a new generation of intelligent health care services that use machine learning and big data to solve problems faced by medical facilities and their patients. Adobe, one of Microsoft's most high-profile cloud customers, is using the service to process a billion Adobe Experience Cloud transactions a day.

Microsoft also announced the general availability of an HDInsight integration with Azure Log Analytics, allowing customers to monitor their various HDInsight clusters from a single interface. Users can view metrics produced by various open-source frameworks and view the performance characteristics of clusters on the virtual machine level, including memory and CPU utilization. More information on enabling Log Analytics is available here.

Finally, Microsoft is reducing the cost of running big data workloads on Azure HDInsight.

Beginning Jan. 5, 2018, customers may see up to a 52 percent reduction in the price of their HDInsight subscriptions. "Customers will get even more value from their batch processing, interactive querying, machine learning, streaming analytics, and real-time analytics workloads on Azure HDInsight at a much lower price," said Rimma Nehme, group program manager of Azure Cosmos DB and Open Source Software Analytics at Microsoft, in a separate announcement.

The savings are more drastic for users running the R Server analytics platform on Azure HDInsight, added Nehme. Microsoft slashed prices by 80 percent on R Server workloads to $0.016 per core hour, she noted.

Pedro Hernandez

Pedro Hernandez

Pedro Hernandez is a contributor to eWEEK and the IT Business Edge Network, the network for technology professionals. Previously, he served as a managing editor for the Internet.com network of...