MapR Integrates Hadoop Distro With Google Compute Engine

 
 
By Chris Preimesberger  |  Posted 2012-07-06
 
 
 

Data analytics software provider MapR Technologies has made its enterprise-grade Apache Hadoop distribution available to run on the new Google Compute Engine, introduced at Google I/O in San Francisco on June 28.

MapR on the Google Compute Engine will be available as a free private beta for a select number of users, MapR said. Those interested in big data analytics should review and fill out the nomination form.

The combination of the new Google service and MapR's Hadoop enables users to provision large MapR clusters on demand and to deploy it as a cloud-based analytics system.

Google originally developed MapReduce to become its internal search framework, which later inspired the community development of Hadoop under Doug Cutting at Yahoo. Now, through MapR's distribution for Hadoop, IT managers can use Google's infrastructure for big data analytics.

MapR demonstrated what it claimed to be a price/performance breakthrough on stage at the Google I/O conference by completing a 1TB TeraSort job in 1 minute, 20 seconds. This result was achieved on a Google Compute Engine cluster in the cloud with 1,256 nodes, 1,256 disks and 5,024 cores€”at a cost of about $16 for the entire subscription-based transaction.

This result compares with the existing world record of 1 minute, 2 seconds that was set with a physical cluster with more than four times the disks, twice as many cores, 200 more servers and at an estimated cost of more than $5 million.

The integration of MapR with Google Compute Engine includes a menu of standard MapR compute configurations. Users have the flexibility within Google Compute Engine to pay on demand and spin up more than 1,000 node clusters if necessary.

Rocket Fuel