MapR's Hadoop Distribution Now Available on Amazon Web Services

MapR software does map-reduce, a process for analyzing vast amounts of data. Version 2.0 of its product is being unveiled at the 2012 Hadoop Summit in San Jose, Calif.

MapR, which delivers its own version of the Apache Hadoop distribution for crunching big data, is unveiling version 2.0 of its product at the 2012 Hadoop Summit June 13 and 14 in San Jose, Calif., where it will also announce that a cloud version of its applications will be available through Amazon Web Services (AWS).

Typical of software companies based on open source, MapR has two versions of its products: MapR M3, which is free; and MapR M5, which is a commercial version of the product with more advanced features than the free one, including high availability, the ability to make data snapshots and do mirroring of datasets, and 24/7 support with an annual subscription.

The deal with Amazon Web Services means that MapR will be one more option for cloud services in Amazon€™s Map Reduce Service. MapR will also integrate with other Amazon Web Services such as S3 (for Simple Storage Service), DynamoDB, a NoSQL database service and Amazon CloudWatch, a cloud monitoring service

Apache Hadoop is an open-source framework for analyzing and organizing vast amounts of data using what€™s called the MapReduce process, originally developed by Google. Google released Hadoop to the open-source community, and the framework is now known as Apache Hadoop. MapReduce is a framework for processing parallel problems across huge data sets using a large number of computer nodes, collectively referred to as a compute cluster.

In the Map part of the process, the master node takes the huge data input, divides it into smaller sub-problems and distributes them to worker nodes in the cluster. The master node then collects the answers to all the sub-problems and combines them in a way so as to answer the problem it was originally trying to solve. MapReduce is central to management of so-called big data in organizations that want to perform data analytics on petabyte-scale databases to obtain the supporting information they need to make business decisions.

The MapR solution is the first non-Amazon Hadoop distribution to be offered within AWS, said Jack Norris, vice president of marketing for MapR. While an enterprise can already run MapR as a cloud service in an Amazon cloud, this integration is better.

€œWith this integration, it now is very simple, very straightforward to provision a cluster in the cloud and integrate it into Amazon. Amazon does all of the billing, the support and it€™s also a reflection of the differentiation of MapR [from competitors],€ said Norris.

Those differentiators, he said, include better data protection and the ability to do data snapshots and mirroring while providing better performance through a reduction in input/output (I/O) speeds through compression and improved replication management.

Other providers of Apache Hadoop-based software include Cloudera, EMC Greenplum (from an acquisition storage vendor EMC made in 2010), and Hortonworks, a co-host of the Hadoop Summit in San Jose. EMC Greenplum has two Hadoop-based products, an EMC-based HD version and a MapR, which is actually MapR€™s product resold by EMC, Norris said.