Microsoft has started a beta program for its Dryad technology, its answer to Google MapReduce.
Microsoft has launched a beta program for its Dryad
Project, a set of technologies aimed at answering enterprise needs
around large data sets.
The company recently unveiled Community Technology Previews (CTPs)
of Dryad, DSC and DryadLINQ, technologies meant to support
data-intensive applications running on a Windows HPC Server 2008 R2
Service Pack 1 cluster.
"These technologies allow you to process large volumes of data in
many types of applications, including data-mining applications, image
and stream processing, and some scientific computations,"
according to the Windows HPC Team blog.
"Dryad and DSC run on the cluster to support data-intensive computing
and manage data that is partitioned across the cluster. DryadLINQ
allows developers to define data intensive applications using the .Net
LINQ model."
In a paper (PDF),
Microsoft researchers described Dryad as a general-purpose distributed
execution engine for "coarse-grain data-parallel applications."
"Dryad is designed to scale from powerful multi-core single
computers, through small clusters of computers, to data centers with
thousands of computers," according to the paper. "The Dryad execution
engine handles all the difficult problems of creating a large
distributed, concurrent application: scheduling the use of computers
and their CPUs, recovering from communication or computer failures, and
transporting data between vertices."
In the paper, Microsoft researchers claim to have sacrificed some
architectural simplicity compared to the MapReduce system design, but
in exchange released "developers from the burden of expressing
their code as a strict sequence of map, sort and reduce steps."
"We also allow the programmer the freedom to specify the
communication transport which, for suitable tasks, delivers substantial
performance gains," the report states.
According to Microsoft, the current beta does not support more than
2,048 individual partitions, and has only been tested on up to 128
individual nodes. In addition, the DryadLINQ LINQ provider does not yet
support all LINQ queries.
Users will need to have an HPC Pack 2008 R2 Enterprise-based cluster
with Service Pack 1 installed. A trial version of HPC Pack 2008 R2
Enterprise is available through the
Windows HPC Server 2008 R2 Evaluation Program, and the
Service Pack 1 updater is available here.