Hadoop Summit: Wrangling Big Data Requires Novel Tools, Techniques

NEWS ANALYSIS: Apache Hadoop is opening up lots of possibilities for analyzing big data, but there is also complexity in the applications and techniques for managing that data effectively.

Big Data Techniques 2

SAN JOSE, Calif.—Business, science and academic researchers have access to an unprecedented array of data to mine and discover significant trends, from social network chatter to consumer buying patterns, credit card transactions and even sports statistics.

But speakers at the Hadoop Summit here June 10 noted that many organizations aren't aware of the novel techniques they can use to analyze mountains of data to gain meaningful insights.

One speaker used sports statistics to illustrate new approaches companies should consider in dealing with big data.

"In sports we're drowning in data, but it's largely ineffective because it needs to be married with small data," said David Epstein, author of The Sports Gene: Inside the Science of Extraordinary Athletic Performance.

He used the example of sprinters, pointing out there is typically only a second or less difference between those who consistently finish first or second and those who finish farther back in the field. He said the emerging area of sports science is using "small data" to see how athletes can improve performance.

In one case, researchers analyzed three basic variables in how three top Olympic shot putters cast the shot. They discovered the gold medalist released at an angle one degree higher than his competitors.

Similarly, researchers took a new approach to a study of broad jumpers' techniques. While past studies looked at things like speed and the force with which the jumpers took off from the board, a smaller set of data by a bio-mechanical jump specialist revealed the key difference for the winner was the angle of takeoff. Using that data, a broad jumper from Great Britain changed his training and won a gold medal even though he wasn't favored.

What is the lesson for business in these examples? As in sports, often the difference between good and great is less than one percent. A company might find, for example, that some small glitch in customer service or response is keeping it from being tops in its market.

TrueCar Finds Hadoop Drives Value

One company that has moved aggressively to get more from big data is car buying service TrueCar, which maintains a massive up-to-the-minute database of selling prices. Russ Foltz-Smith, head of the company's data platform said the biggest challenge it faced when it ramped up efforts to use a Hadoop-powered system to manage its "couple of petabytes" of data was finding qualified developers.

Finding few qualified applicants, it decided to hire and train a developer in the use of Hadoop and went from there. "It was a hard decision, but now we have over 25 Hadoop experts and we're extremely effective at hiring more."

TrueCar has 600 TB of data in active use at any one time and over 20 million buyer profiles.

"The idea is to be the brain of the industry," said Foltz-Smith. "The important thing is you can't be wrong in the automotive industry. If you're wrong, you lose the transaction."

Staying at the cutting edge, TrueCar recently developed what Foltz-Smith says is an advanced, multi-dimensional real-time search capability.

David Needle

David Needle

Based in Silicon Valley, veteran technology reporter David Needle covers mobile, bi g data, and social media among other topics. He was formerly News Editor at Infoworld, Editor of Computer Currents...