Human Error Caused Google Glitch
This weekend, the mighty Google, the world's most popular search
engine, proved fallible. Following a period on Saturday where nearly
all search results were flagged with the warning the sites "may harm
your computer," Google's vice president of search products and user
experience Marissa Mayer issued a statement on the company blog that
attributed the incident to human error.
She explained Google maintains a list of sites known to install
malicious software in the background or "otherwise surreptitiously".
This is done through both manual and automated methods. Google works
with a non-profit organization aimed at fighting malicious software
called StopBadware.org to come up with criteria for maintaining this
list. The organization then provides simple processes for webmasters to
remove their site from the list.
"We periodically update that list and released one such update to the site this morning," she wrote. "Unfortunately (and here's the human error), the URL of '/' was mistakenly checked in as a value to the file and '/' expands to all URLs." As a result, between 6:30 a.m. PST and 7:25 a.m. PST, the message "This site may harm your computer" accompanied almost every result a Google user found. Users who attempted to click through the results saw the "interstitial" warning page that mentions the possibility of badware.
"Fortunately, our on-call site reliability team found the problem quickly and reverted the file," she wrote. "Since we push these updates in a staggered and rolling fashion, the errors began appearing between 6:27 a.m. and 6:40 a.m. and began disappearing between 7:10 and 7:25 a.m., so the duration of the problem for any particular user was approximately 40 minutes."
To anyone who considers Google the only search engine worthy, nearly three quarters of an hour must have seemed an eternity. Mayer issued an apology to all users affected, as well as to site owners whose pages were incorrectly labeled. "We will carefully investigate this incident and put more robust file checks in place to prevent it from happening again," she wrote.
Abner Germanow, director of enterprise networking for Framingham, Mass.-based- IT analyst firm IDC, says a lot of emerging technologies don't yet have the same codified sets of operational procedures designed to maintain stability, which causes errors like Google's. "What you see a series of operational snafus that inspire levels of rigor greater than what was available in the past," he says. "Now that they've experienced this problem, you can bet there is some sort of test that is implemented that will prevent someone from making this dumb mistake again."
Germanow compares what we're seeing now with what we've seen in the security world, where the ability to view the quality of software holistically meant figuring out not only if the software does what you want it to do, but also if it prevents me from doing something that is really dumb. "That second half is a very different mindset than the classic mindset," he says. "We'll see this with Google and pretty much any other online service."
Google's prominence and user base also magnifies a glitch when it occurs, he says. On a random Tuesday, a software glitch with (Web conferencing company) WebEx might be noticed by a couple thousand people, but Germanow says Google's ubiquity make errors hard to hide. "The scale of the number of people who will notice is much larger," he says. "Google on a Saturday morning? You're looking a millions of people."
There was also some confusion over the weekend as to whom was responsible and to what extent StopBadware.org was involved in Google's operations. Google issued several updates to the original statement after StopBadware.org manager Maxim Weinstein posted a statement saying Gooled erroneously stated that they gets the list of badware-infected URLs from the Harvard University-based organization.
He thanked Google for their efforts to correct the statement and reaffirmed his support for the search engine giant and their efforts to protect their customers. "The mistake in Google's initial statement, indicating that we supply them with badware data, is a common misperception," he wrote. "Despite today's glitch, we continue to support Google's effort to proactively warn users of badware sites, and our experience is that they are committed to doing so as accurately and as fairly as possible."
Germanow says the types of errors that resulted in Saturday morning's Google freak-out will continue, as Web-based software develops and matures. "The whole notion of software development in a Web age is different from the classic enterprise software development," he says. "The thing I would look for is not necessarily, -This will never happen again,' but rather -How did they respond?' Did they live and learn, or did they live and not learn?"