Yahoo Orchestrates Move of Hadoop Analytics to Its Own Garage

By Chris Preimesberger  |  Posted 2011-07-05 Print this article Print

Yahoo's CTO acknowledges that more improvements to Hadoop's core IT need to be made before it moves into the rarified air of becoming a de facto industry standard.

SANTA CLARA, Calif. - Apache Hadoop, the data analytics science project that has been ensconced in Yahoo's nurturing cocoon for the last half-dozen years, has grown up and broken through its shell into the real world.

From a business standpoint, Hadoop officially made its break June 29 from Yahoo to be shepherded by a new venture capital-funded company called Hortonworks, named after the Dr. Seuss elephant character.

Apache Hadoop, open-source software built in Java that works with distributed data-intensive applications, enables applications to scale securely in order to handle thousands of nodes and petabytes of data. A number of companies now are using Hadoop daily to predict business patterns, find tendencies in scientific data and predict the weather, among many other functions. More and more businesses are finding out that they need to analyze their stored data and use those metrics to help them make better business decisions. Hadoop has certainly created the most buzz among new-generation big-data analytics packages.

Hadoop has gone from creator Doug Cutting's science project to mainstream business in a short time. Hortonworks was created during the last four months as an independent, privately held, VC-funded company to lead the Hadoop community and market the open-source product into the future. Officially, Mothership Yahoo is now one of its customers.

Hortonworks is an appropriate name for the new company because it is congruent with Hadoop itself-which is named after the stuffed toy elephant that belongs to Cutting's young son.

Hortonworks Not a 'Spinout'

Yahoo CTO Raymie Stata (pictured), a key figure in all of this, is responsible for all IT development at Yahoo. Even though Hadoop has ventured out of the Yahoo development garage to a new home, Stata told eWEEK that Yahoo doesn't consider the new company a "spinout."

"Yahoo will have more people within Yahoo working on Hadoop and related technologies than there will be at Hortonworks," Stata said. "We see this as increasing the investment that's being made in Hadoop.

"Certainly, we're taking some of our key talent and using it to seed the new company, so in that regard there are some employees who will be moving from Yahoo to the new company. But this is not downsizing, it's not a spinout-it's increasing the investment in Hadoop. Yahoo will continue to be a major contributor into all aspects of Hadoop going forward."

As far as the breakout is concerned, Stata said, Yahoo has always had a vision of Hadoop becoming the industry standard in big-data analytics software but also knew it would one day have to establish its own business entity. "Because of the nature of our [Yahoo's] business, we were kind of living in the future, so to speak. We could sort of see what everybody else would need at some point in the future," Stata said, "so we were sort of forced to build it. But it's what we do on Hadoop that ultimately creates value for our shareholders.

"So if Hadoop becomes the de facto industry standard for big-data processing, that's goodness for us, and that's been our mission here in being so open in the development of Hadoop. We're getting to the last mile on that; it's a stretch to say that it is a de facto industry standard at this point. If it fails, it's kind of bad on the community. It's all set up to get to that stature," he said.

Ongoing investment in the core of Hadoop is needed, Stata said. There are improvements that need to be made for the IT to become a standard, he said. "It is necessary to have a company that is going to take that last mile as its core business-one that is not Yahoo's core business," Stata said. "Having Hortonworks out there with a zeal and a mission to see that last mile can take it."

Is There a 'Tether' to Yahoo?

Because of its familial relationship, will Hortonworks remained tethered to Yahoo, so to speak?

"Hortonworks is independent, but I don't know [about being tethered]," Stata said. "The valley's a small valley. You can look at almost any company in Silicon Valley and find a core of Yahoo alums there. We're everywhere!

"Relationships are valuable. Yahoo is an investor, but we're a minority investor in an independent company. We are a development partner committed to continuing to develop that core tech. Because of the relationships, we have the ability to work very deeply in terms of driving that tech forward."

One of the reasons for creating the new company, Stata said, is that Yahoo already has seen what the future holds for enterprise analytics (thanks to its six-year-long Hadoop development stage) and knows what will work. It saw that the big-data analytics need would soon become so widespread that a dedicated company would be necessary to focus solely on it-and not the advertising and Web services businesses that are Yahoo's meal ticket.

"We have been running a truly enterprise deployment of Hadoop, and I don't think anybody does that. It's a departmental solution today," Stata said. "But it's not going to be six years before other people are doing enterprise [analytics as Yahoo does]. That gap between Yahoo and the rest of the user base is shrinking.

"On one hand, it's great to have an independent company that can have this relationship with Yahoo and see pain points that are on the road for a couple of years ahead. We now need to look at other customers and bring that input in and to synthesize it with Yahoo's more futuristic view. Obviously, an independent company with a commercial mandate is going to do it a lot better than an open-source team inside Yahoo."

Chris Preimesberger Chris Preimesberger was named Editor-in-Chief of Features & Analysis at eWEEK in November 2011. Previously he served eWEEK as Senior Writer, covering a range of IT sectors that include data center systems, cloud computing, storage, virtualization, green IT, e-discovery and IT governance. His blog, Storage Station, is considered a go-to information source. Chris won a national Folio Award for magazine writing in November 2011 for a cover story on and CEO-founder Marc Benioff, and he has served as a judge for the SIIA Codie Awards since 2005. In previous IT journalism, Chris was a founding editor of both IT Manager's Journal and and was managing editor of Software Development magazine. His diverse resume also includes: sportswriter for the Los Angeles Daily News, covering NCAA and NBA basketball, television critic for the Palo Alto Times Tribune, and Sports Information Director at Stanford University. He has served as a correspondent for The Associated Press, covering Stanford and NCAA tournament basketball, since 1983. He has covered a number of major events, including the 1984 Democratic National Convention, a Presidential press conference at the White House in 1993, the Emmy Awards (three times), two Rose Bowls, the Fiesta Bowl, several NCAA men's and women's basketball tournaments, a Formula One Grand Prix auto race, a heavyweight boxing championship bout (Ali vs. Spinks, 1978), and the 1985 Super Bowl. A 1975 graduate of Pepperdine University in Malibu, Calif., Chris has won more than a dozen regional and national awards for his work. He and his wife, Rebecca, have four children and reside in Redwood City, Calif.Follow on Twitter: editingwhiz

Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters

Rocket Fuel