Government Sticks Its Fingers Deeper into Your Data Pie - Page 2

Government on Slippery Slope with Google

Indeed, the governments request for information from Google has been large and impossibly vague, as Google has itself protested in its fight.

Google on Jan. 19 revealed that it is resisting a subpoena, first issued in August 2005, from the U.S. Department of Justice to review Google customer search habits.

The DOJ has already secured similar material from MSN, Yahoo and AOL as it seeks to prove the ineffectiveness of commercially available Web content filters to filter porn and thus defend the COPA (Child Online Protection Act) from an ACLU, et al., lawsuit that deemed it unconstitutional.

In essence, the government is hoping to build a simulated Web to illustrate how often Google and other search users encounter porn.

As it is, the government has scaled back its original demands. In its August version, it subpoenaed any and all URLs that could be produced through a query on Googles search engine, along with two months worth of queries entered into Google search between June 1 and July 31, 2005.

After negotiations with Google, the U.S. Attorney General scaled back the subpoena to a random sampling of 1 million URLs from Googles then-current database, along with a random sampling of 1 million search queries submitted on a given day.

How easy would it be for a business to comply with such a request?

For starters, randomness is easy to screw up.

"At times, randomness is more complicated than you think it is," said Richard E. Mackey Jr., a principal at security firm SystemExperts, in an interview with eWEEK.

"You scramble it, but the sample you took was actually from a particular day over a longer period. So I always wonder, What does it mean to be random?

"Thats what gets me about these questions," he said. "They ask these questions, and as soon as you know something about the data, you can say something about what randomizing means. By data center? By time? We want the distribution to look uniform, of queries over a certain amount of time. Or do you want the result like a histogram, showing more frequent instances of [searches] made more frequently?"

In other words, you need to have a good sense of the entire data set in order to create a random subset. To wit: Google protested that the governments consultant, UC Berkeleys Professor Philip Stark, would need knowledge of the upper limit of stored URLs on each server, as well as the total number of search queries run on the relevant day.

That information, however, is considered of competitive value in the cutthroat world of search engines, spurring Google to dig in its heels on what it called an "overbroad, unduly burdensome, vague" request intended to harass. In the process, the request would result in Google giving away trade secrets, it claimed.

Not only that, Google claimed, but the information a) wasnt relevant to the underlying child pornography lawsuit, and b) was available publicly anyway.

The broad question, Kerr said, boils down to what law protects, or should protect: businesses family jewels. In other words, what law protects data?

"The subpoena is the general default used to protect data," Kerr said. "Its mostly designed to protect a defendants stuff, not a third partys. So here you have Google being asked to turn over information that belonged to people who used the search terms. So the ultimate question is, Do you need a law protecting this information beyond a subpoena?


Check out eWEEK.coms for the latest database news, reviews and analysis.