SAN JOSE, Calif.—The enterprise and Hadoop should be the perfect marriage for the next wave of corporate data analysis technology. What could be more enticing than a technology platform that promises to mix the old with the new and come up with “voila!”—a computing engine to outpace your competition?
At this week’s 2013 Hadoop summit, 2,500 enthusiastic attendees were treated to sessions that included “Hadoop Turns a Corner and Sees the Future,” “Hadoop Powers Next-Generation Enterprise Data Architectures” and “Putting Hadoop to Work in the Enterprise.” While Hadoop has indeed matured over the six summits since its foundational roots were established in 2005, current enthusiasm is not yet being translated into broad corporate acceptance.
Gartner Research Vice President Merv Adrian gave attendees a preview of Gartner’s 2013 big data survey. The survey delved into investments in big data and found an “intractable third of the marketplace”—meaning the companies with no plans to invest in big data projects remained essentially unchanged from 2012’s results. The only significant shift was in sharp decline in respondents who answered they did not know if their company had a big data plan: from 11 percent in 2012 to 5 percent in 2013.
While big data (which is tough to define in the first place) is not an exact match for Hadoop uptake, the term and the vendors building their business around the Hadoop open-source Apache project are intertwined. Those vendors—led by conference sponsor Hortonworks—and the attendees are ready for an outpouring of Hadoop-based projects. But what will it take to move that “intractable third” into the corporate investment stage.
Hadoop supporters are hoping that the features added into Hadoop 2.0—notably the YARN resource manager—will convince even those intractable ones to move into the planning and then into the implementation category. The YARN feature in Hadoop 2.0 has some compelling enterprise characteristics.
YARN (yet another resource negotiator) is a big step in moving Hadoop from its massively scalable, but batch-oriented single-application roots to a multi-application engine. In his blog, Arun Murthy (one of the first Hadoop developers), said that “with YARN we now have the ability to run SQL IN Hadoop. For by being in Hadoop (built on YARN), it becomes part of the platform itself and can be managed by YARN to ensure that multiple use cases can be addressed. Why stop at SQL? What about machine learning or modeling? What about processing events (data) as they arrive? Would it be not nice to manage all of these through a common system?”
Hadoop Striving for Maturity, Credibility in the Enterprise
The discussion surrounding YARN was a big piece of the Hadoop summit as it should have been. The ability to add an enterprise-class resource manager allows for numerous applications to be created from the data (the “data lake” as one keynoter described it) contained in a corporation.
The ultimate goal for companies is to merge their traditional row- and column-based data (which, while an older model, still has many of the security and privacy features still under development in Hadoop) with the massively scalable social and machine data within Hadoop. As one presenter explained, financial systems can track sales but Hadoop systems can uncover the customer sentiment surrounding those sales. The two systems in tandem would represent a major step forward in corporate computing.
Hadoop is an open-source project overseen by the Apache Software Foundation, and in addition to YARN, there are ambitious projects, including the Knox security project. The Hadoop 2.0 version is still in beta format, but the nature of open-source development will see this project move, maybe not as fast as some would like, but transparently.
Hortonworks, spun out of Yahoo in 2011 and the recent recipient of an additional $50 million in financing, is the most vocal advocate of Hadoop being implemented in its “purest” open-source form. The combination of new features in Hadoop and an increase in applications that can give customers a strategic advantage based on melding old and new systems could be just the thing to turn the intractable third into the “let’s get going on the next big thing” category.
Eric Lundquist is a technology analyst at Ziff Brothers Investments, a private investment firm. Lundquist, who was editor in chief at eWEEK (previously PC WEEK) from 1996-2008, authored this article for eWEEK to share his thoughts on technology, products and services. No investment advice is offered in this article. All duties are disclaimed. Lundquist works separately for a private investment firm, which may at any time invest in companies whose products are discussed in this article and no disclosure of securities transactions will be made.