Little MapR Technologies on May 25 revealed a new software licensing
agreement with data storage and security giant EMC to add its
intellectual property to EMC's new Apache Hadoop analytics distribution.
San Jose, Calif.-based MapR will become a key part of EMC's Greenplum
HD Enterprise Edition, a 100 percent interface-compatible
implementation of the Apache Hadoop software stack. The new appliance will use MapR Technologies' clustering IP for the
pre-integrated and tested distribution.
Apache Hadoop, created by former Apple, Xerox PARC and Yahoo developer
Doug Cutting, is an open-source software framework built in Java that
works with distributed data-intensive applications. It enables
applications to scale securely in order to handle thousands of nodes
and petabytes of data.
Although a number of Hadoop distributions are available, they don't all
deal with issues such as single points of failure, lack of snapshots
and mirroring, and poor performance -- which is what MapR brings to the
table.
Map R's Feature Set
CEO John Schroeder gave eWEEK an overview of MapR's feature set. It includes:
- NFS direct access, which allows users to use the NFS protocol to simply
load and access data directly in a Hadoop cluster and enables standard
tools and utilities to work directly on data contained in Hadoop.
- Heatmap user interface to provide full cluster visibility and control.
- All single points of failure are eliminated in the Hadoop stack.
- JobTracker High Availability ensures continuous job execution.
- Distributed NameNode with High Availability addresses major reliability issue while also improving performance and scale.
- Snapshots allow point-in-time data protection and recovery.
- Mirroring for business continuity includes wide area replication support.
"This is a major advancement for Hadoop users everywhere. MapR's
innovations coupled with EMC's big data analytics capabilities and
service will allow more people to use the power of big data analytics
and enable substantial market growth," said John Webster, Senior
Analyst at Evaluator Group.
"MapR has managed to innovate on performance, cost reduction,
dependability and ease-of-use all at once. This marks a major shift for
the Hadoop market."
Hadoop Inspired by Google's MapReduce
Cutting, now at Cloudera and serving as the chairman of the Apache
Software Foundation, has said that Hadoop was inspired by Google's
MapReduce (which handles clustering of a system's nodes) and Google
File System. MapR is the commercial implementation of the open source MapReduce.
Hadoop, which is named after Cutting's son's toy elephant, is being
maintained and improved by a large global community of contributors.
Yahoo, one of the first movers in Hadoop and which now sponsors a
Hadoop developers' conference, has been the largest contributor to the
project and uses Hadoop extensively across its own businesses.
"Hadoop has played a leading role in the transformation from
traditional data warehousing to big data analytics," Webster said.
"EMC's Hadoop commercialization strategy is aimed at streamlining and
bulletproofing Hadoop for enterprise users, making Hadoop more of a
must-have real-time analytics tool for the enterprise."