Yahoo Creates Hortonworks to Lead Hadoop

1
2
3
4
5
6
7
8
9
10
1 of 10

Yahoo Creates Hortonworks to Lead Hadoop

On June 29, Hortonworks (named after the Dr. Seuss elephant) was created as an independent, privately held, VC-funded company to lead the Hadoop community and market the open-source product into the future. Its parent, Yahoo, is now one of its customers.

2 of 10

Hadoop Is No Longer a Science Experiment

Yahoo has taken Hadoop from creator Doug Cutting's science project to a world-class platform in just five years, contributing more than 70 percent of the code and helping to establish it as the IT industry??ís pre-eminent Big Data platform.

3 of 10

Hadoop Was a Key Part of IBMs Watson

Hadoop analytics and data discovery abilities were a big reason that IBM's Watson computer was able to win a widely publicized "Jeopardy" showdown last year against a couple of very successful human former champions.

4 of 10

Largest Deployment: 200-Petabyte Data Farm

In the technology's largest deployment (at Yahoo, of course), Hadoop is used daily to analyze more than 200PB of data to make Yahoo more personal and relevant to its users and customers. It works with all aspects of Yahoo's IT system, including search, advertising, user experience and fraud detection.

5 of 10

Big Software for Big Data

Yahoo's Hadoop system includes more than 42,000 servers, made up of clusters of up to 4,000 machines, allowing it to process over 5 million jobs per month. Fourteen million new files are put into Hadoop every day, so scale is not exactly a problem.

6 of 10

Hadoop Will Sell Services Around Its Platform

The Hadoop software is freely obtainable as an open-source project, but a set of premium services are being built now around the technology for enterprises that want to get more than just one level of service.

7 of 10

Now THATS a Lot of Email

Hadoop protects Yahoo's 289 million mailboxes from spam worldwide. Hadoop also plays a key role in customizing 13 million personal Web pages used each day by Web browsers.

8 of 10

Used for More Than Just Web Traffic

Hadoop use has evolved beyond Web traffic and scientific research (pictured: CERN Supercollider, Switzerland). It's now in production across search engines, advertising optimization, machine learning and content feeds. It loads 10 terabytes of data per day onto research clusters.

9 of 10

New Companies Quickly Growing Up Around Hadoop

MapR, Zettaset, Cloudera, HStreaming, Hadapt, DataStax, Datameer—a whole new subset of Hadoop-related companies have been funded and are now out of stealth to help bring the best of the new technology to various markets.

10 of 10

Hadoop Knows It Still Needs to Improve

Yahoo and Hortonworks leaders have acknowledged that Hadoop still needs time to mature and become more user-friendly. It is not a simple IT to deploy and use, and the user interface needs some work. But the teams at both Yahoo and Hortonworks are convinced they will have these issues solved in the months to come.

Top White Papers and Webcasts