Amazon Web Services is using the open-source Apache Hadoop
distributed computing technology to make it easier for users to access
large amounts of computing power to run data-intensive tasks.
AWS (Amazon Web Services) April 2 announced the public beta of its
Amazon Elastic MapReduce initiative, a service designed for businesses,
researchers and analysts who have large number-crunching projects list
Web indexing, data mining, financial analysis and scientific
simulations, according to AWS officials.
Using a hosted Hadoop framework, users can instantly provision as
much compute capacity they need from Amazon’s EC2 (Elastic Compute
Cloud) platform to perform the tasks, and pay only for what they use.
To sign up for the service, go here.
Hadoop, the open-source version of Google’s MapReduce, is already
being used by such companies as Yahoo and Facebook. Google only uses
Hadoop internally.
HP is challenging Google, Amazon and Sun with its Cloud Assure.
There are efforts underway to increase the use of Hadoop in
enterprise data centers. Most recently, a startup, Cloudera—which calls
itself the commercial Hadoop company—announced March 16 the
availability of its first product, the Cloudera Distribution for
Hadoop. The product lets users store and process petabytes of data that
many times is distributed among thousands of servers.
Cloudera also created a portal to help users install and use the company’s free product.
“Cloudera is advancing Hadoop technology to make it easier for
everyone to store and process the same types of big data that large Web
companies are successfully using in their businesses,” Christophe
Bisciglia, the founder of Cloudera and former manager of Google’s
Hadoop cluster, said in a statement at the time of Cloudera’s
announcement.
According to AWS officials, using Hadoop and other MapReduce-based clusters on the Amazon EC2 cloud computing platform
was a difficult task that forced users to do their own set up,
management and cluster tuning. With Amazon Elastic MapReduce, those
tasks are less time-consuming and more affordable, enabling users to
quickly build up and take down Hadoop-based clusters on EC2 in moments.
AWS also is offering sample applications and tutorials to help users
get more comfortable with the new service. Amazon Elastic MapReduce
automatically deploys and configures the number of EC2 instances users
ask for, then launches a Hadoop implementation of the MapReduce tool.
MapReduce then loads the data from Amazon S3 (Simple Storage Service)
and divides it so it can be processed in a parallel fashion. The data
is then recombined after processing, with the end results put back into
S3.
“Some researchers and developers already run Hadoop on Amazon EC2,
and many of them have asked for even simpler tools for large-scale data
analysis,” Adam Selipsky, vice president of product management and
developer relations at AWS, said in a statement.