Close
  • Latest News
  • Cybersecurity
  • Big Data and Analytics
  • Cloud
  • Mobile
  • Networking
  • Storage
  • Applications
  • IT Management
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Menu
Search
  • Latest News
  • Cybersecurity
  • Big Data and Analytics
  • Cloud
  • Mobile
  • Networking
  • Storage
  • Applications
  • IT Management
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Applications
    • Applications
    • Cybersecurity

    Large Data Sets Dangerous to Privacy, MIT Study Shows

    By
    Robert Lemos
    -
    February 3, 2015
    Share
    Facebook
    Twitter
    Linkedin
      big data analytics

      The allure of big data for companies and researchers is in its ability to make connections between disparate events, allowing better insight into the relationships in the data.

      However, for the individuals whose data is collected, big data also means far less privacy. The latest example, published by Massachusetts Institute of Technology researchers, found that four dates and locations of recent purchases are all that is needed to identity 90 percent of people making the purchases. If price information is included, then only three transactions are necessary.

      The study, published in the latest issue of Science, used anonymized data on 1.1 million people and transactions at 10,000 stores. More than 40 percent of the people could be identified with just two data points, while five purchases identified nearly everyone.

      The conclusion: With big data comes big responsibility.

      “[We] really do believe that this data has great potential and should be used,” Yves-Alexandre de Montjoye, an MIT graduate student and the primary author of the paper, said in a statement. “We, however, need to be aware [of] and account for the risks of re-identification.”

      Rather than posing a unique problem, the threat of stripping away anonymity appears to be a general danger of analyzing large data sets. Two years ago, de Montjoye collaborated with another university to conduct an analysis of mobile phone data that found nearly identical results. Four pieces of data—in this case, the location of a base station used by a cell phone—were sufficient to identify 95 percent of the people among 1.5 million cell phone users.

      Previous studies analyzing data sets composed of AOL users and, in a separate case, Netflix users have found similar impacts on privacy: A handful of records can effectively de-cloak almost any user.

      As technology becomes more ubiquitous and consumers carry around multiple devices connected to the Internet—often referred to as the Internet of things—many do not consider that their actions are now being tracked by multiple third parties, Ken Westin, senior security analyst with Tripwire, told eWEEK.

      “Think of how many devices we interact with every day when we make our transactions,” he said. “We are leaving a trail in our electronic records.”

      Many companies “anonymize” the collected data by adding imprecision into the data sets. A technique known as “binning,” for example, creates discrete bins that correspond to a range of values and assign the records to those bins. Yet such techniques only increase the number of transactions needed to de-anonymize the data, the MIT researchers found. Turning the time and location of each purchase into a week number and a approximate region consisting of 150 stores, for example, still allowed the researchers to identify 70 percent of the users from four data points.

      The researchers suggest that large data sets should not be publicly released, but kept by a custodian who could then allow researchers to conduct queries and submit programs to analyze the data. They proposed a system that would do exactly that.

      Users should be wary of any large data set, even if a company claims that it has been anonymized, Luther Martin, chief security architect at Voltage Security, said in a statement.

      The research “suggests that it’s probably better to stop debating exactly how much risk there is in data sets that may not at first seem to contain sensitive information,” he said.

      Avatar
      Robert Lemos
      Robert Lemos is an award-winning freelance journalist who has covered information security, cybercrime and technology's impact on society for almost two decades. A former research engineer, he's written for Ars Technica, CNET, eWEEK, MIT Technology Review, Threatpost and ZDNet. He won the prestigious Sigma Delta Chi award from the Society of Professional Journalists in 2003 for his coverage of the Blaster worm and its impact, and the SANS Institute's Top Cybersecurity Journalists in 2010 and 2014.

      MOST POPULAR ARTICLES

      Android

      Samsung Galaxy XCover Pro: Durability for Tough...

      Chris Preimesberger - December 5, 2020 0
      Have you ever dropped your phone, winced and felt the pain as it hit the sidewalk? Either the screen splintered like a windshield being...
      Read more
      Cloud

      Why Data Security Will Face Even Harsher...

      Chris Preimesberger - December 1, 2020 0
      Who would know more about details of the hacking process than an actual former career hacker? And who wants to understand all they can...
      Read more
      Cybersecurity

      How Veritas Is Shining a Light Into...

      eWEEK EDITORS - September 25, 2020 0
      Protecting data has always been one of the most important tasks in all of IT, yet as more companies become data companies at the...
      Read more
      Big Data and Analytics

      How NVIDIA A100 Station Brings Data Center...

      Zeus Kerravala - November 18, 2020 0
      There’s little debate that graphics processor unit manufacturer NVIDIA is the de facto standard when it comes to providing silicon to power machine learning...
      Read more
      Apple

      Why iPhone 12 Pro Makes Sense for...

      Wayne Rash - November 26, 2020 0
      If you’ve been watching the Apple commercials for the past three weeks, you already know what the company thinks will happen if you buy...
      Read more
      eWeek


      Contact Us | About | Sitemap

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Property of TechnologyAdvice.
      Terms of Service | Privacy Notice | Advertise | California - Do Not Sell My Information

      © 2021 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×