SAN JOSE, Calif.– Over the next few years, Hadoop will be adopted by virtually all enterprises and become just as much a part of the core IT infrastructure as database management systems, according to Mike Gualtieri, principal analyst at Forrester Research.
“It’s a data operating system and a fundamental data platform that in the next couple of years 100 percent of large companies will adopt,” Gualtieri predicted Tuesday at the Hadoop Summit here.
Gualtieri and other speakers said Hadoop adoption will continue to grow rapidly because it “democratizes big data problem solving” and is scaling to offer new capabilities and insights into increasingly large data stores.
Enterprises already are reaping benefits from big data analysis with its ability to extract insights from historical data, such as what customers buy, where they buy it and when. But new tools, powered by Hadoop, are giving enterprises the ability to also predict what customers will buy, greatly enhancing the value of the data to anticipate customer needs and business trends.
Several speakers here emphasized the power of predictive analytics in a variety of applications, including commercial and industrial.
Vince Campisi, CIO of GE Software, said Hadoop is going to help unlock a new ‘Industrial Internet’ that will help manufacturers anticipate when, for example, components are likely to break, so they can replace them ahead of time to save millions of dollars in downtime. GE already offers some of this capability with its own Predictivity software, he noted.
“Hadoop breaks down the data silos. We see the walls coming down,” said Campisi. He said that current systems aren’t designed to handle what he estimates will be 50 billion devices connected to the Internet of Things, with each one generating data. But Hadoop can scale to meet such immense data volumes and offer insights into relationships that “we didn’t even know mattered,” Campisi said.”
One of the architects of Hadoop, Arun Murthy said the rise of big data has led to the same concerns companies had back in the days when mainframes controlled most computer operations, such as management, reliability, security and quality of service. Now Hadoop-powered systems, such as the Hortonworks Enterprise Data Platform, are designed to address those same concerns in the cloud, noted Murthy, who is also a co-founder of Hortonworks.
Murthy said that computing has to speed up for predictive analytics to be effective. “We see speeds and feeds going faster with the need to take snapshots of what’s happening now in micro-seconds,” he said.
Proactive Data Analysis
Typically companies have used a big data system to discern things like when and where someone bought an airplane ticket. In this new era of predictive analytics, Murthy says, companies are becoming more proactive. Today, credit card companies will call when a customer makes a purchase that the system deduces is outside their normal buying pattern and thus, might be fraudulent.
“Hadoop is part of what makes that possible,” he said. Other predictive analysis applications will take hold in the near future, he said. For example, a system might proactively notify a truck driver of a traffic jam that might slow his route. Such a system could also show alternative routes to keep deliveries on schedule.
Hadoop to Shine in Big Data’s Next Phase, Predictive Analytics
Peter Crossley, director of architecture at Webtrends, said his company has been collecting big data for about 20 years.
“With our move to Hadoop we’re able to spin that data around and re-purpose [it] for different things,” said Crossley. The open source system also saves the company between 20 and 40 percent on its clustering costs, as it collects about half a petabyte of data every six months, in some cases taking snapshots of data at 20 millisecond increments, he noted.
Crossley says the flexibility and scalability of Hadoop will be important as Webtrends and others begin collecting huge amounts of data from various devices – all generating data on the Internet of Things. His company uses the open source Spark general processing engine to analyze such data.
“People have devices, multiple phones, laptops all sending us data we have to understand, and a lot of data cleansing is needed. What we’ve learned is that we don’t do it correctly because it’s often in the wrong form,” said Crossley. “With our investment in Spark, we’re able to take data out, clean it and use it as needed.”
In the busy Hadoop Summit exhibit hall, plenty of big and small companies, including tech giants such as Microsoft, SAP, EMC and HP, were showing Hadoop-powered applications.
Ashish Gupta, senior vice president of business development at exhibitor Actian, said enterprise customers interested in Hadoop need third-party products like his company’s Hadoop Analytics with enterprise-grade SQL. Such products enhance what he says is otherwise a still maturing technology.
Gupta said one example is a major mobile service provider that uses the analytics platform to look at a number of data sources, including unstructured data in comments on social media, to get insight into a very specific problem. In this case it found that women between the ages of 30 and 45 were having problems getting WiFi connectivity with the iPhone 4S, and were more likely than others to change carriers rather than try and get the phone fixed.
“That kind of insight is gold,” said Gupta.
Big data is providing more personalized service in a world where it’s more difficult for business to get to know their customers, noted Forrester’s Gualtieri.
“Back in the day, the shopkeeper knew you and knew what you wanted when you stepped in the door. We got away from a lot that and then came CRM.” Now, he says Hadoop-powered predictive analysis provide new levels of “contextual awareness” that is bringing personalization back to, if not the shopkeeper level, something closer than it’s been before.
“Predictive analytics is about probabilities,” he said, comparing it more to educated guessing than truth. “But it can give businesses an edge. You can predict what’s best to put in your online catalogue; sensor data in the car ahead of you will alert you when there’s slippage ahead; and you’ll see all the presidential candidates heavily use predictive analytics like Obama did his first two campaigns to identify swing voters.”
He said the systems can also then advise how best to address specific voters, for example, whether they might be more receptive to a direct appeal like a campaigner knocking on their door, or to a pitch on a candidate’s position on specific issues, like defense and education spending.