Apache Hadoop, like most open-source projects, is a true work in progress and has always possessed great horsepower to analyze batches of data in huge gulps. What it has needed is help with additional features and the interface, because most people don't write code or know how to write queries correctly.
MapR Technologies, an early Hadoop distributor, is doing its part, especially involving features. On Dec. 6 it released MapR v1.2, which includes improvements in performance, dependability and in its ability to interact with data-gathering applications. MapR now has expanded support for C/C++ API (application programming interface) access; it also now supports Windows and Mac clients.
A complete MapR Version 1.2 cluster is also available as a free virtual machine, CEO and founder John Schroeder told eWEEK. Version 1.2 also includes a framework of support for MapReduce 2.0, which further expands the types of applications that can take advantage of a Hadoop cluster, Schroeder said.
With all these improvements, MapR is setting itself apart from competing distributions, such as those made by Cloudera and Hortonworks -- two of the most popular Hadoop incarnations.
Details on some of the new features in Version 1.2 include:
- It has the ability to use next-generation resource management framework: MapR users will be able to take advantage of MapReduce 2.0 once it is ready for production use. Although it is expected to take several months for the community to stabilize Hadoop 0.23, users will be able to take advantage of the combined benefits of MapReduce 2.0, such as backward-compatibility and scalability and MapR's unique capabilities, such as HA (no lost tasks or jobs during a JobTracker or ApplicationMaster failure) and the high-performance shuffle.
- High-performance native-access library: With Version 1.2, MapR provides a libhdfs implementation that bypasses Java altogether and provides high-performance access to the distributed file system from C/C++ applications and other compatible scripting languages. There is no need to recompile applications that use libhdfs, since the API (header file) is identical.
- Upgrade of various packages, including HBase, Hive and Pig: The HBase package in the MapR Distribution has been upgraded to release 0.90.4 and includes significant performance improvements. In addition, MapR has identified several critical stability and data corruption issues in 0.90.4, which we have addressed by backporting 15 fixes from future HBase releases. Versions of Hive and Pig have been upgraded in the MapR distribution, so users can leverage the latest bug fixes and features available from these Apache projects.
- Mac and Windows clients: MapR provides easy-to-install packages for Mac and Windows (in addition to Linux), allowing users of all major platforms to run Hadoop applications without having to install third-party libraries, like cygwin.
- MapR Virtual Machine capabilities: MapR now provides a VMware virtual machine that allows users to experiment with the MapR Distribution. This version makes it easy to experiment with some of MapR's unique capabilities, such as NFS and snapshots. The VM is also an asset to new Hadoop users, making it possible for them to be running on any environment (for example, a laptop) within minutes.
The latest MapR distribution is available the week of Dec. 5 and is available for download at this site.