Cloudera Launches New Hadoop Distribution

Cloudera, a leading provider of Hadoop-based data management software and services, has announced the third version of its Cloudera Distribution for Hadoop (CDH). The company integrates eight open source projects with Hadoop.

Cloudera, a leading provider of Hadoop-based data management software and services, has announced the third version of its Cloudera Distribution for Hadoop (CDH).

Cloudera announced the new version at the Hadoop Summit in Santa Clara, Calif., on June 29. Cloudera's Distribution for Hadoop version 3 is the most comprehensive Hadoop-based data management platform on the market.

In an interview with eWEEK, Mike Olson, CEO of Cloudera said Cloudera's Distribution for Hadoop v3 consists of core Apache Hadoop and eight additional open source projects, all tested and integrated into a platform that is easy to install and use. Cloudera's Distribution lowers the bar for Hadoop adoption and usage in the enterprise, he said.

"Cloudera has gained deep experience in the market working with customers to deploy Hadoop in their organizations and has learned how to use Hadoop effectively," said Doug Cutting, creator of Apache Hadoop and architect at Cloudera, in a statement. "CDH v3 is our response. It includes the most appropriate enterprise-grade add-on projects that enhance the core Apache Hadoop framework and make it easier for any organization to use."

"The Cloudera Distribution for Hadoop is quickly gaining momentum because it provides a stable foundation for enterprises to collect, store and analyze large amounts of data," said Tom Leonard, executive vice president of business development at Pentaho, in a statement. "The Pentaho BI Suite is a perfect complement and we are excited to partner with Cloudera to make it easier for organizations of all sizes to integrate additional data sources and enable a wider population of users to realize value via analysis, reporting and dashboards - either on premise or via the cloud."

Cloudera is also announcing the creation of two new open source projects as part of Cloudera's Distribution for Hadoop. The company is releasing Flume, its data loading infrastructure, and its Hadoop User Environment (HUE) code under the Apache V2 open source license. These additions simplify data acquisition and make it much easier to build attractive user interfaces for Hadoop applications.

"As organizations increasingly struggle to extract value from an ever expanding sea of data, more and more of them are turning to Hadoop," said Stephen O'Grady, an analyst with RedMonk, in a statement. "Cloudera's new offerings lower the barrier to entry for enterprises looking to deploy Hadoop in production environments."

"We've been working with customers to help them use Hadoop to solve various problems," Olson said. "Hadoop on its own is not enough to tackle the big data analysis problems and other problems they face."

Thus, the eight additional projects address important requirements organizations have which ease the adoption of Hadoop, Olson said. Additional projects in CDH v3 include Hive, HBase, Sqoop, Oozie, Flume, Zookeeper, Pig, and Hue.

These projects address deployment requirements in the area of data integration, workflow, scheduling, high-level languages, serialization UI, fast read/write and remote procedure call (RPC). All of these components are selected because they dramatically simplify Hadoop deployment. All are integrated and tested together at scale, Olson said.

"Cloudera's Distribution for Hadoop provides Apollo Group with the key functionality we need to take full advantage of Hadoop for analyzing our academic data," said Satish Menon, senior vice president and head of Apollo's Silicon Valley R&D Center. "With Cloudera's distribution we can get to critical insights faster."

"We teamed up with Cloudera because we're both committed to making Hadoop accessible and easy to use," said Martin Hall, CEO at Karmasphere. "Cloudera Enterprise takes the pain out of operating large, complex Hadoop clusters and simplifies the entire process with its new data management tools."

"Greenplum Chorus is a next generation data collaboration platform which is well complemented by the Cloudera Distribution for Hadoop," said Scott Yara, President of Greenplum, in a statement. "We are increasingly seeing customers deploy the Cloudera Distribution for Hadoop alongside Greenplum products for data staging, processing and MapReduce analytics. Cloudera's expansion in the scope of what defines a Hadoop platform is exciting and better enables Hadoop users everywhere."