Microsoft Beefs Up Linux Data Science Virtual Machine

Microsoft adds its R Server Developer Edition to the Linux version of the company's cloud-based data science offering.


Microsoft R Server Developer Edition is now available on the Linux version of the company's Data Science Virtual Machine (DSVM), enabling users to build models using Microsoft's ScaleR libraries.

In January, Microsoft launched R Server Developer Edition, a free version of the analytics platform for developers, students and nonproduction deployments. The offering arrived nearly a year after the software maker announced it was acquiring Revolution Analytics, the leading commercial supporter of R, the popular open-source statistical computing language.

Making Microsoft R Server Developer available on the Linux flavor of the DSVM offers a major bump in big data processing capabilities. Prior to the release, it only supported Microsoft R Open, which cloud only process as much data as would fit in memory, according to the company.

Addressing the educational and training markets, Microsoft announced that the solution now supports an interactive data science and scientific computing platform used by schools and enterprises that are ramping up their analytics capabilities.

"Another major update on the Linux VM [virtual machine] is our support for JupyterHub, a multiuser solution for Jupyter Notebook server," wrote Gopi Kumar, principal program manager in Microsoft's Data Group, in a blog post. "Based on our experience, Jupyterhub has been particularly useful in education and training scenarios, where a single VM instance is able to support multiple users independently working on their own single-user notebook server instances with OS authentication."

Lastly, DSVM now supports the Julia language in the command line and in the Jupyter notebook kernel, said Kumar.

On the Windows Edition of DSVM, SQL Server 2014 Express Edition has been replaced by SQL Server 2016 Developer Edition, which includes R Services components that enable in-database analytics powered by Microsoft R. Alternatively, users can run R Server outside the database in a stand-alone manner.

For beginners, Windows DSVM includes a data science tutorial on SQL Server R Services. The tutorial is available as a Jupyter notebook, complete with a preloaded dataset, Kumar explained.

Complementing the existing Azure Machine Learning-compatible libraries, Microsoft added a selection of popular open-source artificial intelligence and deep neural network toolkits, including xgboost, Vowpal Wabbit, Rattle, CNTK and mxnet. The toolkits, with the exception of mxnet, are also available for the Linux edition of DSVM.

"Other notable updates to the VM include the Azure CLI [command-line interface], Visual Studio Community 2015 Update 3, which comes with several language tools including R, Python and node.js, as well as pre-installed plugins that make it easier to work with data and analytics technology, including with SQL Server, Azure HDInsight(Hadoop) [and] Azure Data Lake," added Kumar.

Microsoft isn't alone in attempting to shrink the data science skills gap.

In June, IBM announced a new Apache Spark-based platform dubbed the Data Science Experience. The cloud-based development environment, running on IBM Bluemix, offers budding data scientists educational content and takes a collaborative approach to help advance and popularize the discipline.

Pedro Hernandez

Pedro Hernandez

Pedro Hernandez is a contributor to eWEEK and the IT Business Edge Network, the network for technology professionals. Previously, he served as a managing editor for the network of...