IBM is using Hadoop to enable ad hoc analytics at Web scale, in a project called Massive Mashups.NEW YORKIBM
is using Hadoop to enable ad hoc analytics at Web scale in an effort called Massive Mashups, or
M2.
In other words, IBM is making Hadoop
accessible to business professionals to enable them to gain access to analytics
on the fly and presenting the results in an easily accessible way, said Rod
Smith, IBM Software Group's vice president
of emerging Internet technology, during a keynote presentation at Hadoop World: NYC conference
here on Oct. 2.
"It's the petabyte age," Smith said, in which vast amounts of data
needs to be processed and analyzed as the world becomes more instrumented, and
he noted that this transition is part of what IBM
is trying to address in its Smarter Planet strategy.
"The Apache Hadoop project develops open-source software for reliable,
scalable distributed computing," according to its Website. Hadoop enables
applications to work with thousands of nodes and petabytes of data. It was
inspired by Google's MapReduce and GFS (Google File System) papers.
"Hadoop produces a new class of application where there is broader use
of Web content, unstructured content and longer-running applications,"
Smith said.
To make the tool useful to business users, he said, "We had to leverage
easy-to-use data manipulation metaphors like spreadsheets." And the tool
had to make use of rich visualization to enable users to quickly identify
insights.
Enter Project M2, which Smith refers to as an "insight engine for
enabling ad hoc business insight for business users at Web scale." M2 runs
on top of Hadoop and analyzes data. It features a spreadsheet-like interface
for users.
According to a description of M2 on IBM's jStart
Website:
"Massive Mashups (M2) is an
extension of the mashup paradigm that:
Integrates gigabytes, terabytes, or petabytes of unstructured data from
web-based repositories
Collects a wide range of unstructured web data stemming from
user-defined seed URLs
Extracts and Enriches that data using the unstructured information
management architecture you choose ...
Lets you Explore and Visualize this data in specific, user defined
contexts."
It continued, "A tool like M2 provides business users with a new
approach that allows them to break down data into consumable,
situation-specific frames of reference. This enables organizations to translate
untapped, unstructured and often unknown Web data into actionable
intelligence."
IBM has been using M2 in various
proof-of-concept projects with customers. "We've found that we can reduce
the time required to solve some projects from 10 days to 10 minutes,"
Smith said.
Meanwhile, according to the jStart page, M2 offers enterprises the following:
" Better understand your customers, research
competitors, diversify supply chains, or be the first to discover relevant
industry trends. Extend and take control of your web analytics with this
customizable rich web tool.
Go beyond structured database management into unstructured data
management with M2. Seeing the whole picture will help all levels of your
business make better decisions.
Provides business users with a new approach to keep pace with data
escalation. By taking the structure to the data, you can mine petabytes of data
without additional storage requirements."
This whole move toward "do-it-yourself" analytics is enabled by
Hadoop, Smith said.