Microsoft upped its cloud analytics game at the Strata + Hadoop World conference in New York by announcing Apache Storm support in HDInsight, the company’s cloud-based distribution of Hadoop, a popular open-source big data processing platform.
“Available in preview today, we are supporting Apache Storm in HDInsight, allowing our customers to process millions of items of Hadoop data from their Internet of things devices in near-real time using a fully managed Hadoop service,” announced T.K. Rengarajan, corporate vice president of Microsoft’s Data Platform, in an Oct. 15 statement. “By bringing real-time analytics capabilities to HDInsight, we are opening up new customer scenarios, such as the ability to analyze operational data in real time for predictive maintenance.”
Apache Storm is an open-source project that makes it possible to process large data streams in real time. In its support documentation, Microsoft describes the technology as “a distributed, fault-tolerant, open-source computation system that allows you to process data in real time. Storm solutions can also provide guaranteed processing of data, with the ability to replay data that was not successfully processed the first time.”
According to the Apache Software Foundation, Storm does “for real-time processing what Hadoop did for batch processing.” Storm is currently being used by Twitter, Spotify and Alibaba, among others, to help handle their large-scale data processing requirements.
With today’s announcement, Microsoft has joined the bandwagon. “The preview availability of Storm in HDInsight continues Microsoft’s investment in the Hadoop ecosystem and HDInsight,” said Rengarajan.
HDInsight Storm is available as a managed cluster within Azure, where it can be integrated into other Azure services. “For example, Storm might consume data from services such as ServiceBus Queues or Event Hub, and use Websites or Cloud Services to provide data visualization,” explained Microsoft.
In addition to enabling real-time big data analytics, Microsoft envisions that customers will leverage HDInsight Storm to power their online machine learning efforts.
“Storm can be used with a machine learning solution that has previously been trained by batch processing, such as a solution based on Mahout,” Apache’s machine learning and data mining project, according to the HDInsight Storm FAQ. “However its generic, distributed computation model also opens the door for stream-based machine learning solutions.”
Azure HDInsight Storm supports .NET, Java and Python. Microsoft admits that while Storm supports other languages—any programming according to Apache—enabling additional programming support will require HDInsight cluster configuration changes.
In related news, Microsoft revealed that Hadoop vendor Hortonworks’s big data software platform will feature Microsoft Azure integrations. Also new are fresh additions to the company’s Azure Machine Learning ecosystem, including a recommendation engine, an anomaly-detection service and a batch of R packages, the programming language preferred by data scientists.
“These announcements and our participation in the [Strata + Hadoop World] event are part of our commitment to bring big data to everyone by leveraging the power, flexibility and scale of the cloud,” stated Rengarajan.