Microsoft continues to bolster its Azure data services offerings for enterprise users with its announcement that Azure Data Lake Storage Gen2 and Azure Data Explorer are now generally available for use.
In addition, the company revealed that its Azure Data Factory Mapping Data Flow services are now available in preview for users to experiment with using their Azure workloads.
The latest expanded services were announced recently by Julia White, Microsoft’s corporate vice president for Azure, in a post on the Azure Blog.
“We continue to focus on making Azure the best place for your data and analytics,” wrote White. “Azure provides the most comprehensive platform for analytics. With these updates, Azure solidifies its leadership in analytics.”
Azure Data Lake Storage is cloud storage that combines the best of hierarchical file systems and blob storage, while Azure Data Explorer is a fast, fully managed service that simplifies ad hoc and interactive analysis over telemetry, time-series and log data, wrote White. “This service, powering other Azure services like Log Analytics, App Insights, Time Series Insights is useful to query streaming data to identify trends, detect anomalies and diagnose problems.”
Visual, No-Code Experience
The preview of new Mapping Data Flow capabilities in Azure Data Factory provides a visual, no-code experience to help data engineers transform their data for new uses, she wrote. “This complements the Azure Data Factory’s code-first experience to enable data engineers of all skill levels to collaborate and build powerful hybrid data transformation pipelines.”
Jurgen Willis, director of product management for Azure Engineering, wrote in a related post that the two generally available features and the preview feature will provide creative new possibilities to enterprise Azure users and their computing infrastructures.
“Azure Data Lake Storage (ADLS) combines the scalability, cost effectiveness, security model, and rich capabilities of Azure Blob Storage with a high-performance file system that is built for analytics and is compatible with the Hadoop Distributed File System,” wrote Willis. “Customers no longer have to trade off between cost effectiveness and performance when choosing a cloud data lake.”
A major priority of the new services was to ensure that ADLS is compatible with the Apache ecosystem, which was made possible by developing the Azure Blob File System (ABFS) driver, wrote Willis.
“The ABFS driver is officially part of Apache Hadoop and Spark and is incorporated in many commercial distributions,” he explained. The file system semantics are implemented server-side, which eliminates the need for a complex client-side driver and ensures high fidelity file system transactions, he added.
To boost analytics performance, a hierarchical namespace (HNS) which supports atomic file and folder operations was implemented, added Willis. “This is important because it reduces the overhead associated with processing big data on blob storage. This speeds up job execution and lowers cost because fewer compute operations are required.”
Both the ABFS driver and HNS significantly improve ADLS performance, removing scale and performance bottlenecks. At the same time, ADLS offers the same data security capabilities built into Azure Blob Storage, including encryption of data in transit and at rest via TLS 1.2, storage account firewalls, virtual network integration and role-based access security, he wrote.
The ADLS file system also provides support for POSIX compliant access control lists (ACLs), which gives granular security protection that restricts access to only authorized users, groups, or service principals and provides file and object data protection, wrote Willis. ADLS is tightly integrated with Azure Databricks, Azure HDInsight, Azure Data Factory, Azure SQL Data Warehouse and Power BI, which enables broad analytics workflows and business insights across organizations.
Azure Data Lake Storage Gen2 is built for data analytics and is the most comprehensive data lake available, wrote Willis.
Azure Data Explorer (ADX), meanwhile, is a fast, fully managed data analytics service for real-time analysis on large volumes of streaming data. ADX can query 1 billion records in less than a second with no modification of the data or metadata required, wrote Willis. ADX also includes native connectors to Azure Data Lake Storage, Azure SQL Data Warehouse and Power BI and comes with an intuitive query language so that customers can get insights in minutes. ADX is available in 41 Azure regions and is supported by a growing ecosystem of partners, including ISVs and system integrators.
The preview Mapping Data Flow capabilities in Azure Data Factory (ADF) allow customers to visually design, build and manage data transformation processes without learning Spark or having a deep understanding of their distributed infrastructure, wrote Willis. ADF is a hybrid cloud-based data integration service for orchestrating and automating data movement and transformation. ADF includes more than 80 built-in connectors to structured, semi-structured and unstructured data sources. Mapping Data Flow combines a rich expression language with an interactive debugger to easily execute, trigger, and monitor ETL jobs and data integration processes.