EMC wasted no time at EMC World 2011 bringing Hadoop, the once-tiny open-source project that has quickly developed into the world's hottest analytics engine for large unstructured data sets, into its fold.
LAS VEGAS-It would be laughable, really, to have a conference headlined "Cloud Meets Big Data" and not talk about the Apache Software Foundation's Hadoop analytics engine
at some point.
So EMC wasted no time May 9 on Day One at EMC World 2011 here at the Venetian Hotel bringing Hadoop, the once-tiny open-source project that has quickly developed into the world's hottest analytics engine for large unstructured data sets, into its fold.
Because EMC has primarily been a hardware company since its inception in the late 1980s, it figured to introduce its Hadoop deployment as a physical appliance. The device, a standard x86-based server using commodity hardware that will ship in Q3 2011, is called the GreenplumHD Data Computing Appliance. For those who don't need the hardware, the EMC Hadoop offering also will be available as a software-only distribution.
The server combines Hadoop's analytics engine for unstructured data with the EMC Greenplum Database. The combination of Hadoop and Greenplum is a natural: The same device will enable co-processing of large data sets involving both unstructured and structured data.
The EMC Greenplum HD appliance enables an IT shop to deploy so-called "big data" analytics without needing to use specialized point products. It will be made available in two editions: Community and Enterprise, Greenplum co-founder Scott Yara told reporters.
Hadoop-based batch processing of unstructured and structured data at
massive scale using commodity hardware has led to increasing interest in
analytics for business intelligence-and not only for enterprises with large IT
systems. By extracting the knowledge wrapped within unstructured
machine-generated data, organizations can make better decisions that
result in better sales projections, improve service and reduce costs.
EMC bought startup Greenplum in July 2010
and rolled out the first EMC-Greenplum appliance three months later.
Yara told reporters that EMC is the first "billion-dollar company that has come to the [open-source] community and said, 'Let's work together on solving these 'big data' problems, and let's do it the right way.'"
With its new Hadoop/Greenplum distribution, EMC is providing a hardware and software foundation for the processing of new-generation business analytics in cloud systems.
To go with its Hadoop distribution, EMC has assembled an ecosystem of 12 companies that also offer business intelligence and data transfer capabilities. These are Concurrent, CSC, Datameer, Informatica, Jaspersoft, Karmasphere, Microstrategy, Pentaho, SAS, SnapLogic, Talend and VMware. VMware is owned by EMC.