IBM's M2 Project Taps Hadoop for Massive Mashups

IBM is using Hadoop to enable ad hoc analytics at Web scale, in a project called Massive Mashups.

NEW YORK-IBM is using Hadoop to enable ad hoc analytics at Web scale in an effort called Massive Mashups, or M2.

In other words, IBM is making Hadoop accessible to business professionals to enable them to gain access to analytics on the fly and presenting the results in an easily accessible way, said Rod Smith, IBM Software Group's vice president of emerging Internet technology, during a keynote presentation at Hadoop World: NYC conference here on Oct. 2.

"It's the petabyte age," Smith said, in which vast amounts of data needs to be processed and analyzed as the world becomes more instrumented, and he noted that this transition is part of what IBM is trying to address in its Smarter Planet strategy.

"The Apache Hadoop project develops open-source software for reliable, scalable distributed computing," according to its Website. Hadoop enables applications to work with thousands of nodes and petabytes of data. It was inspired by Google's MapReduce and GFS (Google File System) papers.

"Hadoop produces a new class of application where there is broader use of Web content, unstructured content and longer-running applications," Smith said.

To make the tool useful to business users, he said, "We had to leverage easy-to-use data manipulation metaphors like spreadsheets." And the tool had to make use of rich visualization to enable users to quickly identify insights.

Enter Project M2, which Smith refers to as an "insight engine for enabling ad hoc business insight for business users at Web scale." M2 runs on top of Hadoop and analyzes data. It features a spreadsheet-like interface for users.

According to a description of M2 on IBM's jStart Website:

""Massive Mashups (M2) is an extension of the mashup paradigm that:??Ç Integrates gigabytes, terabytes, or petabytes of unstructured data from web-based repositories??Ç Collects a wide range of unstructured web data stemming from user-defined seed URLs??Ç Extracts and Enriches that data using the unstructured information management architecture you choose ...??Ç Lets you Explore and Visualize this data in specific, user defined contexts.""

It continued, "A tool like M2 provides business users with a new approach that allows them to break down data into consumable, situation-specific frames of reference. This enables organizations to translate untapped, unstructured and often unknown Web data into actionable intelligence."

IBM has been using M2 in various proof-of-concept projects with customers. "We've found that we can reduce the time required to solve some projects from 10 days to 10 minutes," Smith said.

Meanwhile, according to the jStart page, M2 offers enterprises the following:

""??Ç Better understand your customers, research competitors, diversify supply chains, or be the first to discover relevant industry trends. Extend and take control of your web analytics with this customizable rich web tool.??Ç Go beyond structured database management into unstructured data management with M2. Seeing the whole picture will help all levels of your business make better decisions.??Ç Provides business users with a new approach to keep pace with data escalation. By taking the structure to the data, you can mine petabytes of data without additional storage requirements.""

This whole move toward "do-it-yourself" analytics is enabled by Hadoop, Smith said.