LAS VEGAS-It would be laughable, really, to have a conference headlined "Cloud Meets Big Data" and not talk about the Apache Software Foundation's Hadoop analytics engine at some point.
So EMC wasted no time May 9 on Day One at EMC World 2011 here at the Venetian Hotel bringing Hadoop, the once-tiny open-source project that has quickly developed into the world's hottest analytics engine for large unstructured data sets, into its fold.
Because EMC has primarily been a hardware company since its inception in the late 1980s, it figured to introduce its Hadoop deployment as a physical appliance. The device, a standard x86-based server using commodity hardware that will ship in Q3 2011, is called the GreenplumHD Data Computing Appliance. For those who don't need the hardware, the EMC Hadoop offering also will be available as a software-only distribution.
The server combines Hadoop's analytics engine for unstructured data with the EMC Greenplum Database. The combination of Hadoop and Greenplum is a natural: The same device will enable co-processing of large data sets involving both unstructured and structured data.
The EMC Greenplum HD appliance enables an IT shop to deploy so-called "big data" analytics without needing to use specialized point products. It will be made available in two editions: Community and Enterprise, Greenplum co-founder Scott Yara told reporters.
Hadoop-based batch processing of unstructured and structured data at massive scale using commodity hardware has led to increasing interest in analytics for business intelligence-and not only for enterprises with large IT systems. By extracting the knowledge wrapped within unstructured machine-generated data, organizations can make better decisions that result in better sales projections, improve service and reduce costs.
EMC bought startup Greenplum in July 2010 and rolled out the first EMC-Greenplum appliance three months later.
Yara told reporters that EMC is the first "billion-dollar company that has come to the [open-source] community and said, 'Let's work together on solving these 'big data' problems, and let's do it the right way.'"
With its new Hadoop/Greenplum distribution, EMC is providing a hardware and software foundation for the processing of new-generation business analytics in cloud systems.
To go with its Hadoop distribution, EMC has assembled an ecosystem of 12 companies that also offer business intelligence and data transfer capabilities. These are Concurrent, CSC, Datameer, Informatica, Jaspersoft, Karmasphere, Microstrategy, Pentaho, SAS, SnapLogic, Talend and VMware. VMware is owned by EMC.