Yahoo, Tata Deal Empowers Hadoop Developers - New Ground (
Page 2 of 2 )
In November, Yahoo announced that it will be the first vendor to launch an
open-source program aimed at advancing the research and development of systems
software for distributed computing. Yahoo said its program is intended to
leverage its leadership in Hadoop to enable researchers to modify and evaluate
the systems software running on a 4,000-processor supercomputer provided by
Yahoo.
Yahoo officials said that, at the time, their company was the primary
contributor to Hadoop, an open-source distributed file system and parallel
execution environment that enables its users to process massive amounts of
data.
As a key part of the November program, Yahoo said it wanted to make Hadoop
available in a supercomputing-class data center to the academic community.
Called the M45, Yahoo's supercomputing cluster, named after one of the best-known
open star clusters, has about 4,000 processors, 3TB of memory, 1.5 petabytes of
disks and a peak performance of more than 27 teraflops, placing it among the
top 50 fastest supercomputers in the world.
For the best of the blogosphere, click here.
M45 was set to run the latest version of Hadoop and other state-of-the-art,
Yahoo-supported open-source distributed computing software. Yahoo officials
said the company built the M45 from commodity hardware, but would not disclose
the specific hardware vendor.
Doug Cutting started the Hadoop project. He is currently an employee of
Yahoo, where he leads the Hadoop project full-time. The Pig language was
created by a group of scientists in Yahoo Research: Ravi Kumar, Christopher
Olston, Ben Reed, Utkarsh Srivastava and Andrew Tomkins, Brachman said.
In December, Yahoo announced that it had become a platinum sponsor of The
Apache Foundation. Yahoo's support of the ASF
stems from its work with the Apache HTTP Server and Lucene projects. Several
members of Yahoo’s development teams are active, long-term code contributors to
Apache Hadoop, the open-source platform that makes it possible to efficiently
process vast amounts of data on a cluster of commodity hardware, the company
said.