A team of university researchers examined more than 100 “popular” Websites and found three-quarters of the sites leaked private information or users’ identifying data to third-party tracking sites.
The survey results were released shortly after Facebook came under fire for inadvertently passing user data to other parties.
More than half (56 percent) of sites “directly leak” private information, and the number goes up to 75 percent if the user ID is included under private data, according to an academic paper. The researchers, Balachander Krishnamurthy of AT&T Labs, and Konstantin Naryshkin and Craig E Wills of Worchester Polytechnic Institute, found that information is leaked in various ways to third-party sites that track user behavior for advertisers. The researchers presented the report at the Web 2.0 Security and Privacy conference in Oakland, Calif., on May 26.
“No site should be exposing user information to a third party,” Wills, a professor of computer science at Worcester Polytechnic Institute, told eWEEK.
In some cases, information was passed “deliberately” to other sites, but in others, it was included as part of routine information exchange. The researchers were unable to tell conclusively whether the inclusion was deliberate or inadvertent. Data leaks could have occurred as users were creating, viewing, editing or just logging into their accounts. They could also have occurred while navigating the site as many of them exposed search terms.
“We believe it is time to move beyond what is clearly a losing battle with third-party aggregators and examine what roles the first-party sites can play in protecting the privacy of their users,” said Wills.
Efforts made to date to address information leakage have been “largely ineffective,” the researchers found. Websites need to take greater responsibility for privacy protection. “Despite a number of proposals and reports put forward by researchers, government agencies and privacy advocates, the problem of privacy has worsened significantly,” Wills said.
Leaked information included email addresses, physical addresses and the user’s Web browser configuration details, according to the paper. Researchers classified the user data as either identifiable or as sensitive. Health information, such as searching for an illness or physical condition, was considered highly sensitive, while name and email address was highly identifiable.
While the majority of leaked information was rated as low-risk in both categories, the authors said this did not mean that users need not be concerned about privacy leaks from Websites. The information could be used to link “disparate pieces” of information, including browsing history stored in cookies and search behavior, to create detailed user profiles, the researchers wrote.
Researchers specifically focused on non-social-networking sites, and used Alexa rankings to select Websites that had over 100,000 registered users. While they identified third-party sites that were getting the information, such as Omniture and Adobe, via doubleclick and digg URLs, the paper did not identify any of the sites included in the survey.
They focused on sites that encourage users to register, since users often share personal and personally identifiable information, including names, physical address and email address, during the registration process. They also examined heath and travel sites, since users conduct searches on these sites that can be used to identify health issues or travel plans.
The same team had previously examined 12 social-networking sites, including Facebook, MySpace and Orkut, to determine what kind of information was being leaked. Researchers noted that since users logged into Orkut using their Google account credentials, third-party firms could correlate the leaked Orkut user identifier with other activity on Google services, such as search or videos viewed on YouTube. Sites may be passing the user ID to referrer sites, such as Digg, but that information is actually being forwarded to Omniture, an analytics firm.