Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Applications
    • Applications

    Hadoop Summit: Wrangling Big Data Requires Novel Tools, Techniques

    Written by

    David Needle
    Published June 11, 2015
    Share
    Facebook
    Twitter
    Linkedin

      eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

      SAN JOSE, Calif.—Business, science and academic researchers have access to an unprecedented array of data to mine and discover significant trends, from social network chatter to consumer buying patterns, credit card transactions and even sports statistics.

      But speakers at the Hadoop Summit here June 10 noted that many organizations aren’t aware of the novel techniques they can use to analyze mountains of data to gain meaningful insights.

      One speaker used sports statistics to illustrate new approaches companies should consider in dealing with big data.

      “In sports we’re drowning in data, but it’s largely ineffective because it needs to be married with small data,” said David Epstein, author of The Sports Gene: Inside the Science of Extraordinary Athletic Performance.

      He used the example of sprinters, pointing out there is typically only a second or less difference between those who consistently finish first or second and those who finish farther back in the field. He said the emerging area of sports science is using “small data” to see how athletes can improve performance.

      In one case, researchers analyzed three basic variables in how three top Olympic shot putters cast the shot. They discovered the gold medalist released at an angle one degree higher than his competitors.

      Similarly, researchers took a new approach to a study of broad jumpers’ techniques. While past studies looked at things like speed and the force with which the jumpers took off from the board, a smaller set of data by a bio-mechanical jump specialist revealed the key difference for the winner was the angle of takeoff. Using that data, a broad jumper from Great Britain changed his training and won a gold medal even though he wasn’t favored.

      What is the lesson for business in these examples? As in sports, often the difference between good and great is less than one percent. A company might find, for example, that some small glitch in customer service or response is keeping it from being tops in its market.

      TrueCar Finds Hadoop Drives Value

      One company that has moved aggressively to get more from big data is car buying service TrueCar, which maintains a massive up-to-the-minute database of selling prices. Russ Foltz-Smith, head of the company’s data platform said the biggest challenge it faced when it ramped up efforts to use a Hadoop-powered system to manage its “couple of petabytes” of data was finding qualified developers.

      Finding few qualified applicants, it decided to hire and train a developer in the use of Hadoop and went from there. “It was a hard decision, but now we have over 25 Hadoop experts and we’re extremely effective at hiring more.”

      TrueCar has 600 TB of data in active use at any one time and over 20 million buyer profiles.

      “The idea is to be the brain of the industry,” said Foltz-Smith. “The important thing is you can’t be wrong in the automotive industry. If you’re wrong, you lose the transaction.”

      Staying at the cutting edge, TrueCar recently developed what Foltz-Smith says is an advanced, multi-dimensional real-time search capability.

      Hadoop Summit: Wrangling Big Data Requires Novel Tools, Techniques

      “It’s very much like a Minority Report experience within TrueCar. It’s not science fiction,” he said.

      The big advantage of working with Hadoop for TrueCar, which uses HortonWorks Data Platform implementation of Hadoop, is its ability to scale. Foltz-Smith says TrueCar’s data has grown 24-fold in the past year with the system processing 12,000 data feeds and 65 billion data points.

      The company also managed some 700 million car images that it makes available to customers. “If there is no vehicle image, the car doesn’t exist (as far as the consumer is concerned),” said Foltz-Smith. “And there is a ton of intelligence embedded in those images.”

      Is Your Data Lake Polluted?

      Walter Maguire, chief field technologist at HP’s Big Data Business Unit, discussed one of the more controversial ways to manage big data, so-called data lakes. A data lake is a storage repository that holds large amounts of raw data in native format until it’s needed.

      But Maguire said he’s heard IT disparage the concept with terms like “data dump” and “data swamp” because while data lakes can be a convenient way to store vast amounts of raw data, it’s not always easy to get at the data you need. “A CIO told me ‘there are three petabytes in my Hadoop data lake and I don’t know which 100 terabytes are really important.’ I’ve heard this again and again,” said Maguire.

      After showing a picture of a murky, polluted lake, Maguire used an image of a clear lake to detail HP’s solution, Haven for Hadoop, which he says “makes the data lake business-ready. An analyst can sit at a console and get at the data no matter what format it’s in,” he said.

      Quentin Clark, CTO of SAP, said data and digitization are at the heart of huge changes in society.

      “Imagine we live in a world where Uber and Airbnb are the largest rental companies and they don’t own any assets. How is that possible? Data is at the heart of it. These companies deeply embrace data to understand what is going on with the user’s experience,” he said.

      Clark said he expects big data systems like SAP’s own HANA in-memory database to help transform more industries.

      “You can imagine any walk of life seeing transformation over next decade. In retail, the ability to understand where customers are in a retail shop and using big data to realize what products you need and see in real time, the effectiveness of sales associates and be able to change how the store operates on an hour-to-hour basis.” He expects big data systems to help oil and gas companies proactively identify when systems or machinery needs downtime for maintenance, saving millions of dollars.

      In health care, he expects wearables and other advances to yield vast new sources of information. “We should be striving to make every doctor smarter in real-time so their knowledge can be augmented in real-time rather than having to chase down medical journals,” he said.

      David Needle
      David Needle
      Based in Silicon Valley, veteran technology reporter David Needle covers mobile, bi g data, and social media among other topics. He was formerly News Editor at Infoworld, Editor of Computer Currents and TabTimes and West Coast Bureau Chief for both InformationWeek and Internet.com.

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      MOST POPULAR ARTICLES

      Artificial Intelligence

      9 Best AI 3D Generators You Need...

      Sam Rinko - June 25, 2024 0
      AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
      Read more
      Cloud

      RingCentral Expands Its Collaboration Platform

      Zeus Kerravala - November 22, 2023 0
      RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
      Read more
      Artificial Intelligence

      8 Best AI Data Analytics Software &...

      Aminu Abdullahi - January 18, 2024 0
      Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
      Read more
      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Video

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2024 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×