Greenplum and Aster Data Systems will support MapReduce, a technique pioneered by Google to analyze large data sets. Both Aster Data Systems and Greenplum, which competes with companies such as Oracle and Netezza in the data warehousing space, believe MapReduce will improve analytics for large data sets.
Aster Data Systems and
data warehousing vendor Greenplum have added support for Google's MapReduce framework
in the name of data analytics for enterprises.
The move by Greenplum
comes as Aster Data Systems unveiled In-Database MapReduce, which is available
now for evaluation.
massively parallel [processing] databases were able to parallelize ordinary
SQL, but had limitations when parallelizing more general programs, whether
written as user-defined functions or a database programming language such as
PL/SQL," said Mayank Bawa,
"In many cases, these
capabilities simply ran on a single node of an MPP database. Now, analysts and
developers can take advantage of the power of MapReduce from within ordinary
SQL, by creating SQL/MR functions in Java, Python, R and more."
MapReduce, companies can write MapReduce programs in a few lines of Perl or
Python that can process and analyze huge volumes of unstructured data for a
variety of applications, such as keyword analysis and content indexing, according
to Greenplum President and co-founder Scott Yara. In addition, while SQL is
expressive enough to allow some analysis and data mining, there is a range of
powerful mining and machine learning tools that are not easily expressed via
SQL, he argued.
"Good examples are
Bayesian machine learning approaches, clustering algorithms and natural
language processing," Yara said.
The effectiveness of MapReduce
as an answer to the analytical
needs of enterprises is becoming an increasingly hot subject of discussion as
cloud computing has gained steam. Gartner analyst Donald Feinberg described
MapReduce as complex but added it can give enterprises the ability to
process extremely large sets of data very fast.
"It's very, very programming-intensive," he
said. "It's not something that your average application programmer
that writes programs in SQL using SQL in C or SQL in Java or something like
that is going to do."
president of Monash Research, described MapReduce as a powerful tool for data
manipulation and analysis.
that are integrating MapReduce and SQL are increasing its applicability and
giving developers and DBAs [database administrators] the ability to work
together on a common parallel data processing infrastructure," Monash said in a