The Apache Software Foundation delivers Hadoop 1.0, the much-anticipated 1.0 version of the popular open-source platform for storing and processing large amounts of data.
The
Apache Software Foundation (ASF) has announced
Apache Hadoop 1.0, the open-source
software framework for reliable, scalable, distributed computing.
The
Jan. 4 release marks a major milestone six years in the making, and has
achieved the level of stability and enterprise-readiness to earn the 1.0
designation, Apache officials said.
"In
addition to the major security improvements and support for HBase, the really
big deal about version 1.0 is this is a release we feel that people can look at
as very stable," Apache Hadoop Vice President Arun Murthy told
eWEEK. "The developer community is
really up for supporting version 1.0, and we expect 1.0 adoption to be much
faster than for other versions."
Murthy
said Apache Hadoop 1.0 reflects six years of development, production
experience, extensive testing, and feedback from hundreds of knowledgeable
users, data scientists and systems engineers, culminating in a highly stable,
enterprise-ready release of the fastest-growing big data platform. It includes
support for:
- HBase
(sync and flush support for transaction logging)
- Security
(strong authentication via Kerberos)
- Webhdfs
(RESTful API to HDFS)
- Performance-enhanced
access to local files for HBase
- Other
performance enhancements, bug fixes and features
- All
version 0.20.205 and prior 0.20.2xx features
Apache
Hadoop serves as a foundation of cloud computing and is at the epicenter of
"big data" solutions, ASF officials said. Hadoop enables
data-intensive distributed applications to work with thousands of nodes and
exabytes of data. Hadoop also enables organizations to more efficiently and
cost-effectively store, process, manage and analyze the growing volumes of data
being created and collected every day. And it connects thousands of servers to
process and analyze data at supercomputing
speed.
"This
release is the culmination of a lot of hard work and cooperation from a vibrant
Apache community group of dedicated software developers and committers that has
brought new levels of stability and production expertise to the Hadoop
project," Murthy said in a statement. "Hadoop is becoming the de
facto data platform that enables organizations to store, process and query vast
torrents of data, and the new release represents an important step forward in
performance, stability and security."
Hadoop
has been referred to as a "Swiss army knife of the 21st century" and
is widely deployed at organizations around the globe, including industry
leaders from across the Internet and social networking landscape such as Amazon
Web Services, AOL, Apple, eBay, Facebook, Foursquare, HP, LinkedIn, Netflix,
The New York Times, Rackspace, Twitter and Yahoo. Other technology leaders such
as Microsoft and IBM have integrated Apache Hadoop into their offerings. Yahoo,
an early pioneer, hosts the world's largest known Hadoop production environment
to date, spanning more than 42,000 nodes.
"Achieving
the 1.0 release status is a momentous achievement from the Apache Hadoop
community and the result of hard development work and shared learnings over the
years," said Jay Rossiter, senior vice president of the Cloud Platform
Group at Yahoo, in a statement. "Apache Hadoop will continue to be an
important area of investment for Yahoo. Today Hadoop powers every click at
Yahoo, helping to deliver personalized content and experiences to more than 700
million consumers worldwide."
"Hadoop,
the first ubiquitous platform to emerge from the ongoing proliferation of Big
Data and NoSQL technologies, is set to make the transition from web to
enterprise technology in 2012, driven by adoption and integration by every
major vendor in the commercial data analytics market," added James
Governor, co-founder of RedMonk, in a statement. "The Apache Software
Foundation plays a crucial role in supporting the platform and its
ecosystem."
In
the independent Forrester Research report "Enterprise Hadoop: The Emerging
Core Of Big Data," published in October 2011, James Kobielus wrote, "Originating
with technologies developed by Yahoo, Google, and other Web 2.0 pioneers in the
mid-2000s, Hadoop is now central to the big data strategies of enterprises,
service providers, and other organizations."
Added
Eric Baldeschwieler, CEO of Hortonworks: "Apache Hadoop is in use
worldwide in many of the biggest and most innovative data applications. The
v1.0 release combines proven scalability and reliability with security and
other features that make Apache Hadoop truly enterprise-ready."
Murthy,
who used to run the nearly 50,000-node Hadoop configuration at Yahoo before
leaving to co-found Hortonworks, said Hadoop 1.0 is a major step for Hadoop but
there is still additional work to be done to make Hadoop even more enterprise-friendly.
Some of this work is being done under the
Hadoop
MapReduce next-generation effort, he said. Results from this effort are
expected to land in the next major release of Hadoop, which is due sometime in
the middle of 2012, Murthy said.
Murthy
added that with Hadoop 1.0 coming out of Apache, he expects that more
enterprises will adopt it. "Like Linux, I think it can quickly take off
and gain mindshare-Hadoop can be to storing and processing large amounts of
data what Linux was to open-source operating systems," he said.
As
with all Apache products, Apache Hadoop software is released under the Apache
License v2.0 and is overseen by a self-selected team of active contributors to
the project. A Project Management Committee (PMC) guides the project's
day-to-day operations, including community development and product releases.
Apache Hadoop release notes, source code, documentation and related resources
are available at
http://hadoop.apache.org/.