Intel Releases Hadoop Distribution for Big Data

The chip maker offers its own Hadoop software platform that is optimized to run on its Xeon server chips.

Intel is releasing its own distribution of Apache Hadoop, a move that not only helps push forward its software ambitions, but also will make it a significant player in the booming area of big data and help fuel sales of its Xeon server processors.

During a Webcast press conference Feb. 26, Boyd Davis, vice president and general manager of Intel's Data Center Software division, said the giant chip maker has been working with Hadoop since 2009—this is actually Intel's third release of its Hadoop software—and has been an open-source advocate for much longer than that. However, for Intel to be more than an outside influence in the big data space, it had to become a player with products to offer, Davis said. The Intel Distribution for Apache Hadoop was a way to do that.

Intel's legacy in high-end data center hardware—including its Xeon server chips and recent offerings around solid-state drive (SSD) memory—and newer efforts in software give the company a strong silicon-based foundation for launching a Hadoop distribution, he said. Intel is optimizing Hadoop to work with features on its chips, such as incorporating Advanced Encryption Standard New Instructions (AES-NI) for accelerating encryption into the Hadoop Distributed File System.

Intel, which has been building up its software capabilities via in-house development and acquisitions, will keep open parts of its Hadoop distribution—making them interoperable with other Hadoop distributions—but will keep some features, including management and monitoring capabilities, to itself. Intel will not open source such software as Intel Manager for Apache Hadoop—for configuration and deployment—or Active Tuner for Apache Hadoop, a tool for improving the performance of compute clusters running the distribution.

What Intel's Hadoop distribution will do is give organizations the confidence that comes when a major tech player supports an open-source technology, providing a "consistent, stable foundation" for the open-source software, Davis said, adding that Intel wants "to make sure Hadoop stays on the leading edge." More vendors, from established players like EMC to smaller companies like Cloudera, are coming out with their own Hadoop offerings.

Big data is becoming a growing trend in the business world, with a staggering amount of data being created from the wide range of devices and machines people are using. Davis pointed to numbers indicating that every 11 seconds, a petabye of data is created around the world.

"We're in an era of generating huge amounts of data," he said, noting that "the key is how to get value out of the data."

Hadoop, which includes about a dozen open-source projects, is designed to enable businesses to more easily do just that: store huge amounts of data, analyze it and leverage it in ways that benefit both the organizations and their end users. For example, businesses can use it to gain a better understanding of what their customers want, while medical researchers can more quickly discover life-saving drugs and communities can improve their environments by better managing traffic patterns.

"Big data has the potential to not only transform business models ... but has the ability to transform society," Davis said.