Large Data Sets Dangerous to Privacy, MIT Study Shows | eWeek

Large Data Sets Dangerous to Privacy, MIT Study Shows

big data analytics
Written By
Robert Lemos
Robert Lemos
Feb 3, 2015
3 minute read
eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

The allure of big data for companies and researchers is in its ability to make connections between disparate events, allowing better insight into the relationships in the data.

However, for the individuals whose data is collected, big data also means far less privacy. The latest example, published by Massachusetts Institute of Technology researchers, found that four dates and locations of recent purchases are all that is needed to identity 90 percent of people making the purchases. If price information is included, then only three transactions are necessary.

The study, published in the latest issue of Science, used anonymized data on 1.1 million people and transactions at 10,000 stores. More than 40 percent of the people could be identified with just two data points, while five purchases identified nearly everyone.

The conclusion: With big data comes big responsibility.

“[We] really do believe that this data has great potential and should be used,” Yves-Alexandre de Montjoye, an MIT graduate student and the primary author of the paper, said in a statement. “We, however, need to be aware [of] and account for the risks of re-identification.”

Rather than posing a unique problem, the threat of stripping away anonymity appears to be a general danger of analyzing large data sets. Two years ago, de Montjoye collaborated with another university to conduct an analysis of mobile phone data that found nearly identical results. Four pieces of data—in this case, the location of a base station used by a cell phone—were sufficient to identify 95 percent of the people among 1.5 million cell phone users.

Previous studies analyzing data sets composed of AOL users and, in a separate case, Netflix users have found similar impacts on privacy: A handful of records can effectively de-cloak almost any user.

As technology becomes more ubiquitous and consumers carry around multiple devices connected to the Internet—often referred to as the Internet of things—many do not consider that their actions are now being tracked by multiple third parties, Ken Westin, senior security analyst with Tripwire, told eWEEK.

“Think of how many devices we interact with every day when we make our transactions,” he said. “We are leaving a trail in our electronic records.”

Many companies “anonymize” the collected data by adding imprecision into the data sets. A technique known as “binning,” for example, creates discrete bins that correspond to a range of values and assign the records to those bins. Yet such techniques only increase the number of transactions needed to de-anonymize the data, the MIT researchers found. Turning the time and location of each purchase into a week number and a approximate region consisting of 150 stores, for example, still allowed the researchers to identify 70 percent of the users from four data points.

The researchers suggest that large data sets should not be publicly released, but kept by a custodian who could then allow researchers to conduct queries and submit programs to analyze the data. They proposed a system that would do exactly that.

Users should be wary of any large data set, even if a company claims that it has been anonymized, Luther Martin, chief security architect at Voltage Security, said in a statement.

The research “suggests that it’s probably better to stop debating exactly how much risk there is in data sets that may not at first seem to contain sensitive information,” he said.

eWeek Logo

eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site's focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.