Businesses can now take Azure Data Lake Analytics, Microsoft’s big data analytics service in the cloud, for a spin, the Redmond, Wash., software giant announced this week.
Microsoft debuted the Apache YARN-based offering on Sept. 29. Now, a month later, the company’s customers can spin up their big data analytics projects without massive upfront investments, according to Oliver Chiu, product marketing manager for Hadoop/Big Data and Data Warehousing at Microsoft.
“Azure Data Lake Analytics is a distributed big data service that dynamically scales your code so that you only need to focus on business logic and not on distributed infrastructure,” stated Chiu in an Oct. 28 announcement. “The analytics service for Azure Data Lake is cost-efficient because you only pay for your job when it is running, and support for Azure Active Directory lets you manage access and roles simply and integrates with your on-premises identity system.”
On the developer front, Azure Data Lake Analytics includes U-SQL, Chiu noted. Besides enabling large-scale data analytics, the new query language, which was inspired by the company’s own distributed runtime for big data systems, is intended to help SQL and .NET developers hit the ground running with the skills they already possess.
“The U-SQL support in Azure Data Lake Tools for Visual Studio includes state of the art support for authoring, debugging and advanced performance analysis features for increased productivity when optimizing jobs running across thousands of nodes,” stated T. K. “Ranga” Rengarajan, corporate vice president of Microsoft’s Data Platform, in an earlier company blog post detailing the technology.
Within Visual Studio, U-SQL support enables developers to more readily spot potential problems. “Visualizations of your U-SQL code allows you to see how your code runs at scale and identify performance bottlenecks and cost optimizations, making it easier to tune your queries,” stated Chiu.
Also available as a public preview is Azure Data Lake Store, formerly the Azure Data Lake elastic database.
“The Azure Data Lake Store provides a single repository where you can easily capture data of any size, type and speed without forcing changes to your application as data scales,” explained Chiu. “In the store, data can be securely shared for collaboration and is accessible for processing and analytics from HDFS [Hadoop Distributed File System] applications and tools.”
HDFS is the fault-tolerant file system component used in the popular open-source Hadoop big data processing platform. Azure Data Lake Store allows organizations that employ Hadoop distributions like Cloudera, Hortonworks and MapR to access the service.
Cloudera has worked with Microsoft to integrate its enterprise data hub with the offering, according to the company’s founder and chief strategy officer, Mike Olson.
“Cloudera on Azure benefits from the Data Lake Store which acts as a cloud-based landing zone for data in your enterprise data hub,” said Olson in statement last month. “Because the store is compatible with WebHDFS, Cloudera can leverage Data Lake and provide customers with a secure and flexible big data solution.”