1Yahoo Creates Hortonworks to Lead Hadoop
On June 29, Hortonworks (named after the Dr. Seuss elephant) was created as an independent, privately held, VC-funded company to lead the Hadoop community and market the open-source product into the future. Its parent, Yahoo, is now one of its customers.
2Hadoop Is No Longer a Science Experiment
Yahoo has taken Hadoop from creator Doug Cutting’s science project to a world-class platform in just five years, contributing more than 70 percent of the code and helping to establish it as the IT industry??Ãs pre-eminent Big Data platform.
3Hadoop Was a Key Part of IBMs Watson
Hadoop analytics and data discovery abilities were a big reason that IBM’s Watson computer was able to win a widely publicized “Jeopardy” showdown last year against a couple of very successful human former champions.
4Largest Deployment: 200-Petabyte Data Farm
In the technology’s largest deployment (at Yahoo, of course), Hadoop is used daily to analyze more than 200PB of data to make Yahoo more personal and relevant to its users and customers. It works with all aspects of Yahoo’s IT system, including search, advertising, user experience and fraud detection.
5Big Software for Big Data
Yahoo’s Hadoop system includes more than 42,000 servers, made up of clusters of up to 4,000 machines, allowing it to process over 5 million jobs per month. Fourteen million new files are put into Hadoop every day, so scale is not exactly a problem.
6Hadoop Will Sell Services Around Its Platform
The Hadoop software is freely obtainable as an open-source project, but a set of premium services are being built now around the technology for enterprises that want to get more than just one level of service.
7Now THATS a Lot of Email
Hadoop protects Yahoo’s 289 million mailboxes from spam worldwide. Hadoop also plays a key role in customizing 13 million personal Web pages used each day by Web browsers.
8Used for More Than Just Web Traffic
Hadoop use has evolved beyond Web traffic and scientific research (pictured: CERN Supercollider, Switzerland). It’s now in production across search engines, advertising optimization, machine learning and content feeds. It loads 10 terabytes of data per day onto research clusters.
9New Companies Quickly Growing Up Around Hadoop
MapR, Zettaset, Cloudera, HStreaming, Hadapt, DataStax, Datameer—a whole new subset of Hadoop-related companies have been funded and are now out of stealth to help bring the best of the new technology to various markets.
10Hadoop Knows It Still Needs to Improve
Yahoo and Hortonworks leaders have acknowledged that Hadoop still needs time to mature and become more user-friendly. It is not a simple IT to deploy and use, and the user interface needs some work. But the teams at both Yahoo and Hortonworks are convinced they will have these issues solved in the months to come.