Microsoft has launched a beta program for its Dryad Project, a set of technologies aimed at answering enterprise needs around large data sets.
The company recently unveiled Community Technology Previews (CTPs) of Dryad, DSC and DryadLINQ, technologies meant to support data-intensive applications running on a Windows HPC Server 2008 R2 Service Pack 1 cluster.
“These technologies allow you to process large volumes of data in many types of applications, including data-mining applications, image and stream processing, and some scientific computations,” according to the Windows HPC Team blog. “Dryad and DSC run on the cluster to support data-intensive computing and manage data that is partitioned across the cluster. DryadLINQ allows developers to define data intensive applications using the .Net LINQ model.”
In a paper (PDF), Microsoft researchers described Dryad as a general-purpose distributed execution engine for “coarse-grain data-parallel applications.”
“Dryad is designed to scale from powerful multi-core single computers, through small clusters of computers, to data centers with thousands of computers,” according to the paper. “The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.”
In the paper, Microsoft researchers claim to have sacrificed some architectural simplicity compared to the MapReduce system design, but in exchange released “developers from the burden of expressing their code as a strict sequence of map, sort and reduce steps.”
“We also allow the programmer the freedom to specify the communication transport which, for suitable tasks, delivers substantial performance gains,” the report states.
According to Microsoft, the current beta does not support more than 2,048 individual partitions, and has only been tested on up to 128 individual nodes. In addition, the DryadLINQ LINQ provider does not yet support all LINQ queries.
Users will need to have an HPC Pack 2008 R2 Enterprise-based cluster with Service Pack 1 installed. A trial version of HPC Pack 2008 R2 Enterprise is available through the Windows HPC Server 2008 R2 Evaluation Program, and the Service Pack 1 updater is available here.