Hadoop to Shine in Big Data's Next Phase, Predictive Analytics
Peter Crossley, director of architecture at Webtrends, said his company has been collecting big data for about 20 years. “With our move to Hadoop we’re able to spin that data around and re-purpose [it] for different things,” said Crossley. The open source system also saves the company between 20 and 40 percent on its clustering costs, as it collects about half a petabyte of data every six months, in some cases taking snapshots of data at 20 millisecond increments, he noted. Crossley says the flexibility and scalability of Hadoop will be important as Webtrends and others begin collecting huge amounts of data from various devices – all generating data on the Internet of Things. His company uses the open source Spark general processing engine to analyze such data. “People have devices, multiple phones, laptops all sending us data we have to understand, and a lot of data cleansing is needed. What we’ve learned is that we don’t do it correctly because it’s often in the wrong form,” said Crossley. “With our investment in Spark, we’re able to take data out, clean it and use it as needed.”Ashish Gupta, senior vice president of business development at exhibitor Actian, said enterprise customers interested in Hadoop need third-party products like his company’s Hadoop Analytics with enterprise-grade SQL. Such products enhance what he says is otherwise a still maturing technology. Gupta said one example is a major mobile service provider that uses the analytics platform to look at a number of data sources, including unstructured data in comments on social media, to get insight into a very specific problem. In this case it found that women between the ages of 30 and 45 were having problems getting WiFi connectivity with the iPhone 4S, and were more likely than others to change carriers rather than try and get the phone fixed. “That kind of insight is gold,” said Gupta. Big data is providing more personalized service in a world where it's more difficult for business to get to know their customers, noted Forrester's Gualtieri. “Back in the day, the shopkeeper knew you and knew what you wanted when you stepped in the door. We got away from a lot that and then came CRM.” Now, he says Hadoop-powered predictive analysis provide new levels of “contextual awareness” that is bringing personalization back to, if not the shopkeeper level, something closer than it’s been before. “Predictive analytics is about probabilities,” he said, comparing it more to educated guessing than truth. “But it can give businesses an edge. You can predict what’s best to put in your online catalogue; sensor data in the car ahead of you will alert you when there’s slippage ahead; and you’ll see all the presidential candidates heavily use predictive analytics like Obama did his first two campaigns to identify swing voters.” He said the systems can also then advise how best to address specific voters, for example, whether they might be more receptive to a direct appeal like a campaigner knocking on their door, or to a pitch on a candidate’s position on specific issues, like defense and education spending.
In the busy Hadoop Summit exhibit hall, plenty of big and small companies, including tech giants such as Microsoft, SAP, EMC and HP, were showing Hadoop-powered applications.