SAN JOSE, Calif.—The enterprise and Hadoop should be the perfect marriage for the next wave of corporate data analysis technology. What could be more enticing than a technology platform that promises to mix the old with the new and come up with "voila!"—a computing engine to outpace your competition?
At this week's 2013 Hadoop summit, 2,500 enthusiastic attendees were treated to sessions that included "Hadoop Turns a Corner and Sees the Future," "Hadoop Powers Next-Generation Enterprise Data Architectures" and "Putting Hadoop to Work in the Enterprise." While Hadoop has indeed matured over the six summits since its foundational roots were established in 2005, current enthusiasm is not yet being translated into broad corporate acceptance.
Gartner Research Vice President Merv Adrian gave attendees a preview of Gartner's 2013 big data survey. The survey delved into investments in big data and found an "intractable third of the marketplace"—meaning the companies with no plans to invest in big data projects remained essentially unchanged from 2012's results. The only significant shift was in sharp decline in respondents who answered they did not know if their company had a big data plan: from 11 percent in 2012 to 5 percent in 2013.
While big data (which is tough to define in the first place) is not an exact match for Hadoop uptake, the term and the vendors building their business around the Hadoop open-source Apache project are intertwined. Those vendors—led by conference sponsor Hortonworks—and the attendees are ready for an outpouring of Hadoop-based projects. But what will it take to move that "intractable third" into the corporate investment stage.
Hadoop supporters are hoping that the features added into Hadoop 2.0—notably the YARN resource manager—will convince even those intractable ones to move into the planning and then into the implementation category. The YARN feature in Hadoop 2.0 has some compelling enterprise characteristics.
YARN (yet another resource negotiator) is a big step in moving Hadoop from its massively scalable, but batch-oriented single-application roots to a multi-application engine. In his blog, Arun Murthy (one of the first Hadoop developers), said that "with YARN we now have the ability to run SQL IN Hadoop. For by being in Hadoop (built on YARN), it becomes part of the platform itself and can be managed by YARN to ensure that multiple use cases can be addressed. Why stop at SQL? What about machine learning or modeling? What about processing events (data) as they arrive? Would it be not nice to manage all of these through a common system?"