MapR software does map-reduce, a process for analyzing vast amounts of data. Version 2.0 of its product is being unveiled at the 2012 Hadoop Summit in San Jose, Calif.
MapR, which delivers its own version of the
Apache
Hadoop distribution for crunching big data, is unveiling version 2.0 of its
product at the 2012 Hadoop Summit June 13 and 14 in San Jose, Calif., where it
will also announce that a cloud version of its applications will be available
through Amazon Web Services (AWS).
Typical of software companies based on open
source, MapR has two versions of its products: MapR M3, which is free; and MapR
M5, which is a commercial version of the product with more advanced features
than the free one, including high availability, the ability to make data
snapshots and do mirroring of datasets, and 24/7 support with an annual
subscription.
The deal with Amazon Web Services means that
MapR will be one more option for cloud services in Amazons Map Reduce Service.
MapR will also integrate with other Amazon Web Services such as S3 (for Simple
Storage Service), DynamoDB, a NoSQL database service and Amazon CloudWatch, a
cloud monitoring service
Apache Hadoop is an open-source framework for
analyzing and organizing vast amounts of data using whats called the MapReduce
process, originally developed by Google. Google released Hadoop to the open-source
community, and the framework is now known as Apache Hadoop. MapReduce is a
framework for processing
parallel problems across huge data sets
using a large number of computer nodes, collectively referred to as a compute
cluster.
In the Map part of the process, the master
node takes the huge data input, divides it into smaller sub-problems and
distributes them to worker nodes in the cluster. The master node then collects
the answers to all the sub-problems and combines them in a way so as to answer
the problem it was originally trying to solve. MapReduce is central to
management of so-called big data in organizations that want to perform data
analytics on petabyte-scale databases to obtain the supporting information they
need to make business decisions.
The MapR solution is the first non-Amazon
Hadoop distribution to be offered within AWS, said Jack Norris, vice president
of marketing for MapR. While an enterprise can already run MapR as a
cloud service in an Amazon cloud, this integration is better.
With this integration, it now is very
simple, very straightforward to provision a cluster in the cloud and integrate
it into Amazon. Amazon does all of the billing, the support and its also a
reflection of the differentiation of MapR [from competitors], said Norris.
Those differentiators, he said, include
better data protection and the ability to do data snapshots and mirroring while
providing better performance through a reduction in input/output (I/O) speeds
through compression and improved replication management.
Other providers of Apache Hadoop-based
software include Cloudera, EMC Greenplum (from an acquisition storage vendor
EMC made in 2010), and
Hortonworks,
a co-host of the Hadoop Summit in San Jose. EMC Greenplum has two
Hadoop-based products, an EMC-based HD version and a MapR, which is actually
MapRs product resold by EMC, Norris said.