Hadoop Summit: Wrangling Big Data Requires Novel Tools, Techniques
"It's very much like a Minority Report experience within TrueCar. It's not science fiction," he said. The big advantage of working with Hadoop for TrueCar, which uses HortonWorks Data Platform implementation of Hadoop, is its ability to scale. Foltz-Smith says TrueCar's data has grown 24-fold in the past year with the system processing 12,000 data feeds and 65 billion data points. The company also managed some 700 million car images that it makes available to customers. "If there is no vehicle image, the car doesn't exist (as far as the consumer is concerned)," said Foltz-Smith. "And there is a ton of intelligence embedded in those images." Is Your Data Lake Polluted?But Maguire said he's heard IT disparage the concept with terms like "data dump" and "data swamp" because while data lakes can be a convenient way to store vast amounts of raw data, it's not always easy to get at the data you need. "A CIO told me 'there are three petabytes in my Hadoop data lake and I don't know which 100 terabytes are really important.' I've heard this again and again," said Maguire. After showing a picture of a murky, polluted lake, Maguire used an image of a clear lake to detail HP's solution, Haven for Hadoop, which he says "makes the data lake business-ready. An analyst can sit at a console and get at the data no matter what format it's in," he said. Quentin Clark, CTO of SAP, said data and digitization are at the heart of huge changes in society. "Imagine we live in a world where Uber and Airbnb are the largest rental companies and they don't own any assets. How is that possible? Data is at the heart of it. These companies deeply embrace data to understand what is going on with the user's experience," he said. Clark said he expects big data systems like SAP's own HANA in-memory database to help transform more industries. "You can imagine any walk of life seeing transformation over next decade. In retail, the ability to understand where customers are in a retail shop and using big data to realize what products you need and see in real time, the effectiveness of sales associates and be able to change how the store operates on an hour-to-hour basis." He expects big data systems to help oil and gas companies proactively identify when systems or machinery needs downtime for maintenance, saving millions of dollars. In health care, he expects wearables and other advances to yield vast new sources of information. "We should be striving to make every doctor smarter in real-time so their knowledge can be augmented in real-time rather than having to chase down medical journals," he said.
Walter Maguire, chief field technologist at HP's Big Data Business Unit, discussed one of the more controversial ways to manage big data, so-called data lakes. A data lake is a storage repository that holds large amounts of raw data in native format until it's needed.