Microsoft Introduces Spark Connector for Azure DocumentDB

Microsoft Spark Connector for Azure DocumentDB Supports Data Science

big data tools
Mar 16, 2017
2 minute read
eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Microsoft is serious about becoming the go-to provider of big data cloud services for its enterprise Azure cloud-computing customers.

The company introduced a new Apache Spark connector for Azure DocumentDB, among several new additions to the company’s big data solutions portfolio, during the Strata + Hadoop World big data conference in San Jose, Calif. today.

Taken altogether, the new additions are intended to help businesses piece together flexible, high-performance big data processing and analytics systems in the cloud.  

Apache Spark is the developer-friendly, open-source data processing engine with a knack for making short work of big data workloads and enabling sophisticated analytics. Combined with Azure DocumentDB, Microsoft’s NoSQL document service, the technology now enables Azure customers to perform data science and glean insights in real-time, according to Dharma Shukla, distinguished engineer and general manager of open-source software analytics and NoSQL at Microsoft.

“Connecting Apache Spark to Azure DocumentDB accelerates our customer’s ability to solve fast-moving data sciences problems where data can be quickly persisted and retrieved using DocumentDB,” Shukla said in a March 15 statement.

“The Spark to DocumentDB connector efficiently exploits the native DocumentDB managed indexes and enables updateable columns when performing analytics, push-down predicate filtering, and advanced analytics to data sciences against fast-changing globally-distributed data, ranging from IoT, data science, and analytics scenarios,” Shukla’s statement said.

The connector, which uses the Azure DocumentDB Java SDK, is available now at GitHub.

Microsoft also announced the general availability of new MongoDB APIs (application programming interfaces) for DocumentDB. Backed by enterprise-grade service-level agreements, the APIs enable applications built on MongoDB NoSQL databases to “seamlessly target” DocumentDB data using their existing client drivers and toolsets, Shukla said.

Several new enhancements to HDInsight, the company’s cloud-based distribution of Hadoop, were also unveiled.

In a security-enhancing move, Microsoft has extended HDInsight’s native security capabilities for Hadoop workloads, which include authentication and encryption, to other workloads including Apache Spark and Interactive Hive, also known as Live Long and Process.  Interactive Hive is a new HDInsight cluster type that “allows in memory caching that makes Hive queries much more interactive and faster,” according to this online support document from Microsoft.

This new feature makes HDInsight one of the world’s “flexible, and open Big Data solution on the cloud with in-memory caches (using Hive and Spark) and advanced analytics through deep integration with R Services,” claims the software giant.

Now that HDInsight supports Apache Hive 2.1.1, customers can use the solution for data warehouse scenarios that deliver sub-second query performance and don’t require time- and resource- consuming data movement tasks, Shukla said.

Finally, SQL Server Community Technology Preview (CTP) 1.4 for both Windows and Linux will be available to download in the coming days, Microsoft announced today. In addition to some Linux-specific tweaks, it will contain index-rebuilding features that add some flexibility to a busy database administrator’s (DBA) scheduling index maintenance and recovery to-do list.

eWeek Logo

eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site's focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.