LAS VEGAS-It would be laughable, really, to have a conference headlined “Cloud Meets Big Data” and not talk about the Apache Software Foundation’s Hadoop analytics engine at some point.
So EMC wasted no time May 9 on Day One at EMC World 2011 here at the Venetian Hotel bringing Hadoop, the once-tiny open-source project that has quickly developed into the world’s hottest analytics engine for large unstructured data sets, into its fold.
Because EMC has primarily been a hardware company since its inception in the late 1980s, it figured to introduce its Hadoop deployment as a physical appliance. The device, a standard x86-based server using commodity hardware that will ship in Q3 2011, is called the GreenplumHD Data Computing Appliance. For those who don’t need the hardware, the EMC Hadoop offering also will be available as a software-only distribution.
The server combines Hadoop’s analytics engine for unstructured data with the EMC Greenplum Database. The combination of Hadoop and Greenplum is a natural: The same device will enable co-processing of large data sets involving both unstructured and structured data.
The EMC Greenplum HD appliance enables an IT shop to deploy so-called “big data” analytics without needing to use specialized point products. It will be made available in two editions: Community and Enterprise, Greenplum co-founder Scott Yara told reporters.
Hadoop-based batch processing of unstructured and structured data at massive scale using commodity hardware has led to increasing interest in analytics for business intelligence-and not only for enterprises with large IT systems. By extracting the knowledge wrapped within unstructured machine-generated data, organizations can make better decisions that result in better sales projections, improve service and reduce costs.
EMC bought startup Greenplum in July 2010 and rolled out the first EMC-Greenplum appliance three months later.
Yara told reporters that EMC is the first “billion-dollar company that has come to the [open-source] community and said, ‘Let’s work together on solving these ‘big data’ problems, and let’s do it the right way.'”
With its new Hadoop/Greenplum distribution, EMC is providing a hardware and software foundation for the processing of new-generation business analytics in cloud systems.
To go with its Hadoop distribution, EMC has assembled an ecosystem of 12 companies that also offer business intelligence and data transfer capabilities. These are Concurrent, CSC, Datameer, Informatica, Jaspersoft, Karmasphere, Microstrategy, Pentaho, SAS, SnapLogic, Talend and VMware. VMware is owned by EMC.
NetApp Brings Out Its Own Hadoop Deployment
At the same time on May 9, EMC’s biggest storage competitor, NetApp, also had Hadoop news. The Sunnyvale, Calif.-based network storage maker unveiled a set of new storage arrays running its E-Series Platform that are bundled with — and optimized for — using Hadoop.
NetApp’s E-Series is using new software gained from the company’s completed $480 million purchase of the Engenio external storage systems business from LSI; this enables NetApp to enter some important new emerging markets.
Apache Hadoop, created by former Apple, Xerox PARC and Yahoo developer Doug Cutting, is an open-source software framework built in Java that works with distributed data-intensive applications. It enables applications to scale securely in order to handle thousands of nodes and petabytes of data.
Cutting, now at Cloudera and serving as the chairman of the Apache Software Foundation, has said that Hadoop was inspired by Google’s MapReduce (which handles clustering of a system’s nodes) and Google File System.
Hadoop, which is named after Cutting’s son’s toy elephant, is being maintained and improved by a large global community of contributors. Yahoo, one of the first movers in Hadoop and which now sponsors a Hadoop developers’ conference, has been the largest contributor to the project and uses Hadoop extensively across its own businesses.
“Hadoop has played a leading role in the transformation from traditional data warehousing to big data analytics,” said analyst John Webster, senior partner at Evaluator Group. “EMC’s Hadoop commercialization strategy is aimed at streamlining and bulletproofing Hadoop for enterprise users, making Hadoop more of a must-have real-time analytics tool for the enterprise.”