Earlier this week, Microsoft announced new Azure Data Lake services along with a new query language called U-SQL.
Microsoft announced new Azure Data Lake services for analytics in the cloud that includes a hyper-scale repository, a new analytics service built on YARN that enables data developers and data scientists to analyze all data, and HDInsight, a managed Hadoop, Spark, Storm and HBase service.
But perhaps most important to developers is that Azure Data Lake Analytics includes U-SQL, a language that unifies the benefits of SQL with the expressive power of a developer’s own code. In addition, Microsoft announced Azure Data Lake Tools for Visual Studio, which provide an integrated development environment that spans the Azure Data Lake and simplifies authoring, debugging and optimization for processing and analytics.
“We know that many developers and data scientists struggle to be successful with big data using existing technologies and tools,” said T. K. “Ranga” Rengarajan, corporate vice president of Microsoft’s Data Platform, in a blog post on the expanded Azure Data Lake. “Code-based solutions offer great power, but require significant investments to master, while SQL-based tools make it easy to get started but are difficult to extend. We’ve faced the same problems inside Microsoft and that’s why we introduced, U-SQL, a new query language that unifies the ease of use of SQL with the expressive power of C#.”
Microsoft built U-SQL on the same distributed runtime that powers the company’s big data systems, Rengarajan said. “Millions of SQL and .NET developers can now process and analyze all of their data with the skills they already have. The U-SQL support in Azure Data Lake Tools for Visual Studio includes state of the art support for authoring, debugging and advanced performance analysis features for increased productivity when optimizing jobs running across thousands of nodes.”
U-SQL’s scalable distributed query capability enables developers and data scientists to efficiently analyze data in the store and across relational stores such as Azure SQL Database.
“U-SQL was especially helpful because we were able to get up and running using our existing skills with .NET and SQL,” says Sam Vanhoutte, CTO at Codit, in a statement. “This made big data easy because we didn’t have to learn a whole new paradigm. With Azure Data Lake, we were able to process data coming in from smart meters and combine it with the energy spot market prices to give our customers the ability to optimize their energy consumption and potentially save hundreds of thousands of dollars.”
Microsoft’s U-SQL enables developers and data scientists to process any type of data, to use custom code easily and to scale to any size of data, said Michael Rys, principal program manager for Microsoft Big Data, in a post on the company’s Visual Studio Blog.
“U-SQL allows you to write declarative big data jobs, as well as easily include your own user code as part of those jobs,” said Scott Guthrie, executive vice president of the Microsoft Cloud and Enterprise Group. “Inside Microsoft, developers have been using this combination in order to be productive operating on massive data sets of many exabytes of scale, processing mission critical data pipelines. In addition to providing an easy to use experience in the Azure management portal, we are delivering a rich set of tools in Visual Studio for debugging and optimizing your U-SQL jobs. This lets you play back and analyze your big data jobs, understanding bottlenecks and opportunities to improve both performance and efficiency, so that you can pay only for the resources you need and continually tune your operations.”
Rys said Microsoft designed U-SQL as an evolution of the declarative SQL language with native extensibility through user code written in C#.
“This unifies both paradigms, unifies structured, unstructured, and remote data processing, unifies the declarative and custom imperative coding experience, and unifies the experience around extending your language capabilities,” he said.
Microsoft built the language this way to avoid the limitations of other SQL-based languages that are not optimized to handle unstructured data or are more difficult to code for scale.
Moreover, Rys said Microsoft built U-SQL based on the company’s internal experience with SCOPE and existing languages such as T-SQL, ANSI SQL and Hive.
“For example, we base our SQL and programming language integration and the execution and optimization framework for U-SQL on SCOPE, which currently runs hundred thousands of jobs each day internally,” he said.
Microsoft also aligned the U-SQL metadata system, the SQL syntax and language semantics with T-SQL and ANSI SQL. And the company uses C# data types and the C# expression language. In addition, Microsoft looked to Hive and other big data languages to identify patterns and data processing requirements and integrate them into its framework.