Microsoft Azure HDInsight has gained support for Hadoop 2.4, the latest edition of the popular open source big data framework. Microsoft announced the new update during the Hadoop Summit, currently taking place in San Jose, Calif.
The Redmond, Wash.-based software maker first released HDInsight in March 2013 as a public preview, allowing Azure customers to spin up Hadoop clusters on the cloud. “A cluster will take a few minutes to create (as part of creating it will configure the necessary virtual machines that together make up your Hadoop cluster),” Scott Guthrie, who went on to become Microsoft’s new cloud chief, blogged back in March.
As Microsoft pursues a “cloud-first” strategy, it is evolving into a pillar of the company’s enterprise computing solutions slate. HDInsight is a “part of our comprehensive data platform,” which includes the company’s SQL Server technology and Intelligent Systems Service for machine-generated data coursing through the Internet of things, Microsoft said in a statement.
In addition to Hadoop 2.4 compatibility, the update promises to speed up big data workloads, according to Microsoft’s Oliver Chiu, product marketing manager for Hadoop/big data and data warehousing. “This release of HDInsight provides orders of magnitude (up to 100x) performance improvements to query response times and continues to leverage the benefits of YARN,” he said in a statement. YARN is the cluster resource management sub-project introduced in Hadoop 2.0.
Now, customers can glean business insights from unstructured data in the Azure storage service’s “blobs” faster than before, added Chiu. “HDInsight can mine your blob data to drive new business decisions,” he said. Use cases include analyzing “clickstream data to send targeted marketing, log analysis to measure usage, or social sentiment from Twitter to correlate with sales.”
The newly refreshed HDInsight service also sports “an easy-to-use Web interface that gives users of HDInsight a friendly experience.” Based on SQL Server technology, the new feature allows users to create interactive queries over Hive, the Hadoop component that provides large data set querying and management capabilities.
Apache Hadoop has emerged as the leading big data platform, garnering widespread industry support as businesses seek ways to store, manage and capitalize on ever-increasing amounts of data. From Google, SAS and Intel to fledgling startups, tech companies big and small have flocked to the open-source platform.
Microsoft enlisted the help of Hortonworks, a Hadoop-based data analytics provider, to not only ensure that Azure HDInsight is compatible with Hadoop, but that the software giant gives back to the open-source project. Microsoft signed a Hadoop co-development deal with Hortonworks in 2011, just three months after the startup began its operations.
In a blog post, Microsoft said it “prioritized contributing back to the community and Apache Hadoop-related projects, e.g., Tez, Stinger and Hive.” In total, the company claims to have contributed 30,000 lines of code and more than 10,000 engineering hours to the effort.