New Ground

By Darryl K. Taft  |  Posted 2008-03-25 Print this article Print


In November, Yahoo announced that it will be the first vendor to launch an open-source program aimed at advancing the research and development of systems software for distributed computing. Yahoo said its program is intended to leverage its leadership in Hadoop to enable researchers to modify and evaluate the systems software running on a 4,000-processor supercomputer provided by Yahoo.

Yahoo officials said that, at the time, their company was the primary contributor to Hadoop, an open-source distributed file system and parallel execution environment that enables its users to process massive amounts of data.

As a key part of the November program, Yahoo said it wanted to make Hadoop available in a supercomputing-class data center to the academic community. Called the M45, Yahoo's supercomputing cluster, named after one of the best-known open star clusters, has about 4,000 processors, 3TB of memory, 1.5 petabytes of disks and a peak performance of more than 27 teraflops, placing it among the top 50 fastest supercomputers in the world.

For the best of the blogosphere, click here.

M45 was set to run the latest version of Hadoop and other state-of-the-art, Yahoo-supported open-source distributed computing software. Yahoo officials said the company built the M45 from commodity hardware, but would not disclose the specific hardware vendor.

Doug Cutting started the Hadoop project. He is currently an employee of Yahoo, where he leads the Hadoop project full-time. The Pig language was created by a group of scientists in Yahoo Research: Ravi Kumar, Christopher Olston, Ben Reed, Utkarsh Srivastava and Andrew Tomkins, Brachman said.

In December, Yahoo announced that it had become a platinum sponsor of The Apache Foundation. Yahoo's support of the ASF stems from its work with the Apache HTTP Server and Lucene projects. Several members of Yahoo's development teams are active, long-term code contributors to Apache Hadoop, the open-source platform that makes it possible to efficiently process vast amounts of data on a cluster of commodity hardware, the company said.

Darryl K. Taft covers the development tools and developer-related issues beat from his office in Baltimore. He has more than 10 years of experience in the business and is always looking for the next scoop. Taft is a member of the Association for Computing Machinery (ACM) and was named 'one of the most active middleware reporters in the world' by The Middleware Co. He also has his own card in the 'Who's Who in Enterprise Java' deck.

Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters

Rocket Fuel