Close
  • Latest News
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Logo
  • Latest News
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Applications
    • Applications
    • Cybersecurity

    Large Data Sets Dangerous to Privacy, MIT Study Shows

    By
    Robert Lemos
    -
    February 3, 2015
    Share
    Facebook
    Twitter
    Linkedin
      big data analytics

      The allure of big data for companies and researchers is in its ability to make connections between disparate events, allowing better insight into the relationships in the data.

      However, for the individuals whose data is collected, big data also means far less privacy. The latest example, published by Massachusetts Institute of Technology researchers, found that four dates and locations of recent purchases are all that is needed to identity 90 percent of people making the purchases. If price information is included, then only three transactions are necessary.

      The study, published in the latest issue of Science, used anonymized data on 1.1 million people and transactions at 10,000 stores. More than 40 percent of the people could be identified with just two data points, while five purchases identified nearly everyone.

      The conclusion: With big data comes big responsibility.

      “[We] really do believe that this data has great potential and should be used,” Yves-Alexandre de Montjoye, an MIT graduate student and the primary author of the paper, said in a statement. “We, however, need to be aware [of] and account for the risks of re-identification.”

      Rather than posing a unique problem, the threat of stripping away anonymity appears to be a general danger of analyzing large data sets. Two years ago, de Montjoye collaborated with another university to conduct an analysis of mobile phone data that found nearly identical results. Four pieces of data—in this case, the location of a base station used by a cell phone—were sufficient to identify 95 percent of the people among 1.5 million cell phone users.

      Previous studies analyzing data sets composed of AOL users and, in a separate case, Netflix users have found similar impacts on privacy: A handful of records can effectively de-cloak almost any user.

      As technology becomes more ubiquitous and consumers carry around multiple devices connected to the Internet—often referred to as the Internet of things—many do not consider that their actions are now being tracked by multiple third parties, Ken Westin, senior security analyst with Tripwire, told eWEEK.

      “Think of how many devices we interact with every day when we make our transactions,” he said. “We are leaving a trail in our electronic records.”

      Many companies “anonymize” the collected data by adding imprecision into the data sets. A technique known as “binning,” for example, creates discrete bins that correspond to a range of values and assign the records to those bins. Yet such techniques only increase the number of transactions needed to de-anonymize the data, the MIT researchers found. Turning the time and location of each purchase into a week number and a approximate region consisting of 150 stores, for example, still allowed the researchers to identify 70 percent of the users from four data points.

      The researchers suggest that large data sets should not be publicly released, but kept by a custodian who could then allow researchers to conduct queries and submit programs to analyze the data. They proposed a system that would do exactly that.

      Users should be wary of any large data set, even if a company claims that it has been anonymized, Luther Martin, chief security architect at Voltage Security, said in a statement.

      The research “suggests that it’s probably better to stop debating exactly how much risk there is in data sets that may not at first seem to contain sensitive information,” he said.

      Robert Lemos
      Robert Lemos is an award-winning freelance journalist who has covered information security, cybercrime and technology's impact on society for almost two decades. A former research engineer, he's written for Ars Technica, CNET, eWEEK, MIT Technology Review, Threatpost and ZDNet. He won the prestigious Sigma Delta Chi award from the Society of Professional Journalists in 2003 for his coverage of the Blaster worm and its impact, and the SANS Institute's Top Cybersecurity Journalists in 2010 and 2014.

      MOST POPULAR ARTICLES

      Big Data and Analytics

      Alteryx’s Suresh Vittal on the Democratization of...

      James Maguire - May 31, 2022 0
      I spoke with Suresh Vittal, Chief Product Officer at Alteryx, about the industry mega-shift toward making data analytics tools accessible to a company’s complete...
      Read more
      Cybersecurity

      Visa’s Michael Jabbara on Cybersecurity and Digital...

      James Maguire - May 17, 2022 0
      I spoke with Michael Jabbara, VP and Global Head of Fraud Services at Visa, about the cybersecurity technology used to ensure the safe transfer...
      Read more
      Applications

      Cisco’s Thimaya Subaiya on Customer Experience in...

      James Maguire - May 10, 2022 0
      I spoke with Thimaya Subaiya, SVP and GM of Global Customer Experience at Cisco, about the factors that create good customer experience – and...
      Read more
      Big Data and Analytics

      GoodData CEO Roman Stanek on Business Intelligence...

      James Maguire - May 4, 2022 0
      I spoke with Roman Stanek, CEO of GoodData, about business intelligence, data as a service, and the frustration that many executives have with data...
      Read more
      Cloud

      Yotascale CEO Asim Razzaq on Controlling Multicloud...

      James Maguire - May 5, 2022 0
      Asim Razzaq, CEO of Yotascale, provides guidance on understanding—and containing—the complex cost structure of multicloud computing. Among the topics we covered:  As you survey the...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2022 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×