Microsoft Upgrades Windows-Based Data Science Virtual Machine

Microsoft's cloud-based virtual machine for big data analytics is now available in a version running on Windows Server 2016.

Microsoft cloud

Data Science Virtual Machine (DSVM), Microsoft's cloud-based offering for big data analytics, is now available in a new preview version based on Windows Server 2016 Datacenter Edition. Previously, the Windows version of DSVM only ran on a Windows Server 2012 image. Microsoft also makes DSVM available in Ubuntu and CentOS Linux flavors.

In upgrading to Windows Server 2016, DSVM users now have access to additional tools and functionality, including Docker container support, noted Microsoft software engineer Udayan Kumar in a June 6 announcement.

The new virtual machine also comes bundled with Office ProPlus and includes an upgrade to Microsoft R Server 9.1, which now features sentiment analysis and other cognitive models. Microsoft acquired Revolution Analytics, the R statistical programming language specialist and maker of Revolution R Enterprise, on which R Server is based.

DSVM on Windows Server 2016 also offers a streamlined deep learning setup experience for those using the offering for artificial intelligence (AI) workloads.

"Earlier, Windows DSVM users had to install the GPU based deep learning capabilities via an extension script on the Windows Server 2012 version of the DSVM," Kumar explained. "With this release, we are pre-installing the GPU Nvidia drivers, CUDA toolkit 8.0, and cuDNN [Nvidia CUDA Deep Neural Network] library in the image. Along with it, we have also installed the latest GPU versions (these will also work with CPU-only machines) of the following popular deep learning frameworks: Microsoft Cognitive Toolkit (CNTK), TensorFlow [and] MXnet."

Furthering Microsoft's efforts to provide data scientists with AI tools, the company today announced the release of its new Machine Learning Library for Apache Spark, the open-source big data processing engine.

Microsoft Machine Learning Library for Apache Spark (MMLSpark) is intended to help users run more experiments and apply machine learning techniques on large datasets, according to a June 7 announcement. By providing a consistent and simplified set of APIs (application programming interfaces) for different types of data, MMLSpark means that users no longer need to wrestle with low-level APIs to get the results they seek, according to the company.

The solution also helps open the door to deep neural network (DNN) models used in computer vision systems.

"With MMLSpark, we provide easy-to-use Python APIs that operate on Spark DataFrames and are integrated into the Spark ML [machine learning] pipeline model," wrote Roope Astala, a senior program manager, and Sudarshan Raghunathan, a principal software engineering manager at Microsoft, in a separate June 7 blog post.

DataFrames are Spark Datasets, or distributed collections of data, that have been organized into named columns. Meanwhile, the Spark ML pipeline describes a set of uniform, high-level APIs based on DataFrames that allow users to create and tune machine learning workflows.

Taken altogether, MMLSpark allows users to kick off image-based machine learning projects at scale. "By using these APIs, you can rapidly build image analysis and computer vision pipelines that use the cutting-edge DNN algorithms," added Astala and Raghunathan.

MMLSpark is available now as an open-source project on the GitHub code repository.

Pedro Hernandez

Pedro Hernandez

Pedro Hernandez is a contributor to eWEEK and the IT Business Edge Network, the network for technology professionals. Previously, he served as a managing editor for the network of...