Amazon Web Services is rolling out a public beta of its Amazon Elastic MapReduce, a service that makes it easier for businesses and researchers to provision capacity for data-intensive tasks being run on Amazon's EC2 cloud computing platform. Hadoop, the open-source implementation of Google's MapReduce technology, is being used by such companies as Facebook and Yahoo. Some vendors, such as startup Cloudera, are looking for ways to push Hadoop into enterprise data centers.
Amazon Web Services is using the open-source Apache Hadoop
distributed computing technology to make it easier for users to access
large amounts of computing power to run data-intensive tasks.
AWS (Amazon Web Services) April 2 announced the public beta of its
Amazon Elastic MapReduce initiative, a service designed for businesses,
researchers and analysts who have large number-crunching projects list
Web indexing, data mining, financial analysis and scientific
simulations, according to AWS officials.
Using a hosted Hadoop framework, users can instantly provision as
much compute capacity they need from Amazon's EC2 (Elastic Compute
Cloud) platform to perform the tasks, and pay only for what they use.
To sign up for the service, go here
Hadoop, the open-source version of Google's MapReduce, is already
being used by such companies as Yahoo and Facebook. Google only uses
HP is challenging Google, Amazon and Sun with its Cloud Assure.
There are efforts underway to increase the use of Hadoop in
enterprise data centers. Most recently, a startup, Cloudera-which calls
itself the commercial Hadoop company-announced March 16 the
availability of its first product, the Cloudera Distribution for
Hadoop. The product lets users store and process petabytes of data that
many times is distributed among thousands of servers.
Cloudera also created a portal
to help users install and use the company's free product.
"Cloudera is advancing Hadoop technology to make it easier for
everyone to store and process the same types of big data that large Web
companies are successfully using in their businesses," Christophe
Bisciglia, the founder of Cloudera and former manager of Google's
Hadoop cluster, said in a statement at the time of Cloudera's
According to AWS officials, using Hadoop and other MapReduce-based clusters on the Amazon EC2 cloud computing
was a difficult task that forced users to do their own set up,
management and cluster tuning. With Amazon Elastic MapReduce, those
tasks are less time-consuming and more affordable, enabling users to
quickly build up and take down Hadoop-based clusters on EC2 in moments.
AWS also is offering sample applications and tutorials to help users
get more comfortable with the new service. Amazon Elastic MapReduce
automatically deploys and configures the number of EC2 instances users
ask for, then launches a Hadoop implementation of the MapReduce tool.
MapReduce then loads the data from Amazon S3 (Simple Storage Service)
and divides it so it can be processed in a parallel fashion. The data
is then recombined after processing, with the end results put back into
"Some researchers and developers already run Hadoop on Amazon EC2,
and many of them have asked for even simpler tools for large-scale data
analysis," Adam Selipsky, vice president of product management and
developer relations at AWS, said in a statement.