Microsoft: Zero Data Retention Not Possible to Keep Search Engines Viable
Yahoo's time reduction of users' search engine data storage from 13 to three months caught the eye of privacy advocates, who called for Google to lead the way toward a zero retention policy. Microsoft's privacy guru Brendon Lynch explains why this just isn't possible to ensure Live Search performs as a quality Web service. The issue seems headed for some resolution in 2009, with search engine providers meeting with the European Commission in February.Yahoo's reduction of its duration for user log data retention has some industry watchers calling for Google and Microsoft to do the same and predicting that pressure from government regulators' will lead to zero retention policies in search next year.
Brendon Lynch, director of privacy strategy at Microsoft, told eWEEK zero retention policies are just not possible for Microsoft without reducing the quality of its Live Search offering, among other issues.
The issue sparked Dec. 17 when Yahoo pledged to reduce the period it saves the user log data its search engine gathers -- user queries, IP addresses and cookies that create digital trails -- from 13 months to 3 months. Yahoo, Google and Microsoft argue that data about users is necessary to provide quality search, protect users from malicious users and scam artists.
The move by Yahoo, the No. 2 search engine provider, is easily the most aggressive to data. Search leader Google pared its data retention period from 18 months to 9 in September. No. 3 player Microsoft has been stuck at 18 months since July 2007, though it has said it would be willing to go down to six months if Google and Yahoo agreed to do the same.
Yet Yahoo's move was received with cautious praise by some privacy advocates who believe Yahoo, Google and Microsoft can do better. Peter Eckersley, staff technologist with consumer rights group Electronic Frontier Foundation, told eWEEK:
This looks like an attempt by Yahoo to keep a lot of information that they can use for their own internal research and engineering purposes, while being able to say "it would be extremely hard for us to find your search history file in this huge stack of search history files that we keep". That's a big step in the right direction.
However, Eckersley noted that Yahoo still retains 24 of the 32 digits of users' IP addresses, which means that if Yahoo had someone's IP address, and wanted to find their search history, it could dig out fifty or a hundred files and say that one of them belongs to that person. A human, or more likely a statistical analysis program, could then read them and match a file to that person.
John Simpson, a privacy advocate for the non-profit consumer rights group Consumer Watchdog, said no less than a zero retention policy will suffice, arguing that since most users of Google or Yahoo return daily they are constantly providing a new stream of personal data. His group wants users to have the option to control their data and browse anonymously.
But Microsoft's Lynch said the search data Live Search collects has a number of uses. In addition to analyzing users' search queries to improve query relevancy, Lynch said user log data helps Microsoft Live Search thwart security threats, keep people from gaming search ranking results, and combat click fraud scammers.