Apache Hadoop analytics provider MapR Technologies has expanded its enterprise-grade capabilities by adding the NoSQL database to its M7 distribution, creating a faster big data-crunching analytics platform.
The company made the announcement at last week's O'Reilly Strata Conference + Hadoop World 2012 in New York.
The biggest advantage to adding NoSQL is that Hadoop, which processes data offline in a per-batch system, now can be used in a real-time manner in some use cases, greatly speeding up analytics projects for line-of-business enterprise employees.
"This now brings unprecedented Hadoop and NoSQL capabilities together for a broader set of use cases," said MapR CEO and cofounder John Schroeder. "With MapR M7, big data operations ranging from batch analytics to real-time database functions can be performed with enterprise-grade reliability and protection."
Natively Compatible with Apache Database
MapR M7 is binary-compatible with Apache HBase, Hadoop's distributed, scalable big data store. Customers don't have to recompile or change code to take advantage of the enterprise-grade features, Schroeder said. M7 also supports Apache HBase within the same cluster.
Automation is one of the key plays here. The new M7 platform makes HBase enterprise grade with instant recovery from hardware and software failures, disaster recovery and full data protection with snapshots and mirroring, Schroeder said. Even with multiple hardware or software outages and errors, applications will continue running with no administrator actions required.
M7 increases the performance of HBase to higher levels, Schroeder said. By eliminating the need for compactions, M7 provides uniform and consistent performance. Secondly, it uses new data structures to minimize the read- and write-amplification factor, inserts and updates. M7 also supports in-memory columns, providing more options to increase database performance.
MapR has two other versions of its products: MapR M3, which is free; and MapR M5, which is a commercial version of the product with more advanced features than the free distribution. These include high availability, the ability to make data snapshots and do mirroring of datasets, and 24/7 support with an annual subscription.
MapR already has made its distribution available to run on the new Google Compute Engine, introduced at Google I/O in San Francisco June 28, and on Amazon's Web services cloud. MapR added Windows and Mac OS operating system support about a year ago.
Apache Hadoop is an open-source framework for analyzing and organizing vast amounts of data using what is called the MapReduce process, originally developed by Google. Google released Hadoop to the open-source community. MapReduce is a framework for processing parallel problems across huge data sets using a large number of computer nodes, collectively referred to as a compute cluster.