Yahoo Adds Workflows, Authentication to Hadoop

By Chris Preimesberger  |  Posted 2010-06-29 Print this article Print

At the Hadoop Summit, Yahoo announces two key enhancements to the beta-level cloud computing platform: Hadoop with Security, and Oozie, a new workflow engine.

SANTA CLARA, Calif.-Yahoo, which is running 38,000 Apache Hadoop Web servers and counting among its hundreds of thousands of other servers, is quickly building quite an ecosystem for its heavy-lifting cloud computing software platform.

Hadoop, an open-source project created by Yahoo developers in 2005 that became an Apache project in late 2006, has engaged dozens of core developers, hundreds of contributors and thousands of interested IT folks since then.

The Hadoop software layer handles control and scaling of Yahoo's exponentially increasing volumes of data. In only five years, the company has taken Hadoop from a 20-server prototype in Yahoo Labs to the world's largest Web server deployment running in production across Yahoo's global network.

The attendance at the annual Hadoop Summit at the Santa Clara Convention Center has been growing in parallel to the amount of data Yahoo has to process each day, if not quite as fast. The first Hadoop event attracted about 300 people in 2008, and it increased to about 600 last year, while this year more than 1,000 people crammed into the convention center ballroom here on June 29.

So there's much interest in how this software works. Thousands of Websites are experiencing problems in dealing with the processing and storage of a deluge of business and personal data, and there is a lot to learn from how Yahoo is approaching this.

At the summit, Yahoo announced two key enhancements to the beta-level platform: Hadoop with Security, and Oozie, a new workflow engine.

"Hadoop with Security is Hadoop integrated with Kerberos [authentication securityware], which amounts to a set of security updates that enable much stronger authentication," Blake Irving, Yahoo's new chief product officer, told summit attendees. "Hadoop with Security brings more secure collaboration and sharing of authenticated data."
Hadoop with Security also sets the stage for secure cloud computing multitenancy by providing authenticated secure access and processing of sensitive data, Irving said.
Oozie, which integrates with Hadoop with Security, is the platform's new open-source workflow management and coordination engine for developers managing jobs running on Hadoop servers. It includes Hadoop Distributed File System, Pig and MapReduce, Irving said. It is designed for Yahoo's internal compute-intensive use cases that require managing complex work processes and ETL (extraction, transformation and loading) on a global scale, Irving said.

Both Hadoop with Security and Oozie are available for free download here.

Hadoop is currently used only for internal Yahoo purposes, but because it is open source, it is freely available for anybody to use.

Yahoo originally used Hadoop for specific science projects, but it quickly morphed into the enterprise-class platform it is today to improve its own personalized user experiences. Hadoop plays a key role in Yahoo's home page, Yahoo Search, Yahoo Mail and others by remembering user preferences, among other things.

"Businesses across all sectors are looking for ways to leverage the vast quantities of data they are accumulating, and Apache Hadoop is an efficient solution for processing data at scale," said IDC analyst Melanie Posey. "Hadoop has matured and is now becoming an enterprise-ready cloud computing technology with the addition of Kerberos authentication. Now organizations of various sizes can leverage Yahoo's Hadoop investment and deployments to run it on their own systems and build out their own Hadoop deployments without starting from scratch on internal science experiments."

How else can Yahoo monetize this considerable five-year investment? Yahoo is not, and never has been, in the software-selling business.

"We're already monetizing Hadoop every day," Shelton Shugar, senior vice president of cloud computing at Yahoo, told eWEEK. "We use it to optimize our advertising and ad placement businesses here at Yahoo, and it's a very important ingredient in our overall IT environment."

On the idea of commercializing Hadoop-or parts of it-with special Yahoo "secret sauce" of some kind, Shugar told eWEEK that the company has certainly talked about it.

"But at this time, we don't have any plans to use Hadoop in that way," he said.

In other news from the summit, Cloudera announced a new distribution of its own Hadoop implementation. 

Chris Preimesberger Chris Preimesberger was named Editor-in-Chief of Features & Analysis at eWEEK in November 2011. Previously he served eWEEK as Senior Writer, covering a range of IT sectors that include data center systems, cloud computing, storage, virtualization, green IT, e-discovery and IT governance. His blog, Storage Station, is considered a go-to information source. Chris won a national Folio Award for magazine writing in November 2011 for a cover story on and CEO-founder Marc Benioff, and he has served as a judge for the SIIA Codie Awards since 2005. In previous IT journalism, Chris was a founding editor of both IT Manager's Journal and and was managing editor of Software Development magazine. His diverse resume also includes: sportswriter for the Los Angeles Daily News, covering NCAA and NBA basketball, television critic for the Palo Alto Times Tribune, and Sports Information Director at Stanford University. He has served as a correspondent for The Associated Press, covering Stanford and NCAA tournament basketball, since 1983. He has covered a number of major events, including the 1984 Democratic National Convention, a Presidential press conference at the White House in 1993, the Emmy Awards (three times), two Rose Bowls, the Fiesta Bowl, several NCAA men's and women's basketball tournaments, a Formula One Grand Prix auto race, a heavyweight boxing championship bout (Ali vs. Spinks, 1978), and the 1985 Super Bowl. A 1975 graduate of Pepperdine University in Malibu, Calif., Chris has won more than a dozen regional and national awards for his work. He and his wife, Rebecca, have four children and reside in Redwood City, Calif.Follow on Twitter: editingwhiz

Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters

Rocket Fuel