Database Vendors Take a Page from Google for Data Analytics

Greenplum and Aster Data Systems will support MapReduce, a technique pioneered by Google to analyze large data sets. Both Aster Data Systems and Greenplum, which competes with companies such as Oracle and Netezza in the data warehousing space, believe MapReduce will improve analytics for large data sets.

Aster Data Systems and data warehousing vendor Greenplum have added support for Google's MapReduce framework in the name of data analytics for enterprises.

The move by Greenplum comes as Aster Data Systems unveiled In-Database MapReduce, which is available now for evaluation.

"Traditionally, massively parallel [processing] databases were able to parallelize ordinary SQL, but had limitations when parallelizing more general programs, whether written as user-defined functions or a database programming language such as PL/SQL," said Mayank Bawa, CEO of Aster.

"In many cases, these capabilities simply ran on a single node of an MPP database. Now, analysts and developers can take advantage of the power of MapReduce from within ordinary SQL, by creating SQL/MR functions in Java, Python, R and more."

Leveraging Greenplum MapReduce, companies can write MapReduce programs in a few lines of Perl or Python that can process and analyze huge volumes of unstructured data for a variety of applications, such as keyword analysis and content indexing, according to Greenplum President and co-founder Scott Yara. In addition, while SQL is expressive enough to allow some analysis and data mining, there is a range of powerful mining and machine learning tools that are not easily expressed via SQL, he argued.

"Good examples are Bayesian machine learning approaches, clustering algorithms and natural language processing," Yara said.

The effectiveness of MapReduce as an answer to the analytical needs of enterprises is becoming an increasingly hot subject of discussion as cloud computing has gained steam. Gartner analyst Donald Feinberg described MapReduce as complex but added it can give enterprises the ability to process extremely large sets of data very fast.

"It's very, very programming-intensive," he said. "It's not something that your average application programmer that writes programs in SQL using SQL in C or SQL in Java or something like that is going to do."

Curt Monash, president of Monash Research, described MapReduce as a powerful tool for data manipulation and analysis.

"Companies that are integrating MapReduce and SQL are increasing its applicability and giving developers and DBAs [database administrators] the ability to work together on a common parallel data processing infrastructure," Monash said in a statement.