Human Error Caused Google Glitch
This weekend, the mighty Google, the world's most popular search
engine, proved fallible. Following a period on Saturday where nearly
all search results were flagged with the warning the sites "may harm
your computer," Google's vice president of search products and user
experience Marissa Mayer issued a statement on the company blog that
attributed the incident to human error.
She explained Google maintains a list of sites known to install
malicious software in the background or "otherwise surreptitiously".
This is done through both manual and automated methods. Google works
with a non-profit organization aimed at fighting malicious software
called StopBadware.org to come up with criteria for maintaining this
list. The organization then provides simple processes for webmasters to
remove their site from the list.
"We periodically update that list and released one such update to the
site this morning," she wrote. "Unfortunately (and here's the human
error), the URL of '/' was mistakenly checked in as a value to the file
and '/' expands to all URLs." As a result, between 6:30 a.m. PST and
7:25 a.m. PST, the message "This site may harm your computer"
accompanied almost every result a Google user found. Users who
attempted to click through the results saw the "interstitial" warning
page that mentions the possibility of badware.
"Fortunately, our on-call site reliability team found the problem
quickly and reverted the file," she wrote. "Since we push these updates
in a staggered and rolling fashion, the errors began appearing between
6:27 a.m. and 6:40 a.m. and began disappearing between 7:10 and 7:25
a.m., so the duration of the problem for any particular user was
approximately 40 minutes."
To anyone who considers Google the only search engine worthy, nearly
three quarters of an hour must have seemed an eternity. Mayer issued an
apology to all users affected, as well as to site owners whose pages
were incorrectly labeled. "We will carefully investigate this incident
and put more robust file checks in place to prevent it from happening
again," she wrote.
Abner Germanow, director of enterprise networking for Framingham,
Mass.-based- IT analyst firm IDC, says a lot of emerging technologies
don't yet have the same codified sets of operational procedures
designed to maintain stability, which causes errors like Google's.
"What you see a series of operational snafus that inspire levels of
rigor greater than what was available in the past," he says. "Now that
they've experienced this problem, you can bet there is some sort of
test that is implemented that will prevent someone from making this
dumb mistake again."
Germanow compares what we're seeing now with what we've seen in the
security world, where the ability to view the quality of software
holistically meant figuring out not only if the software does what you
want it to do, but also if it prevents me from doing something that is
really dumb. "That second half is a very different mindset than the
classic mindset," he says. "We'll see this with Google and pretty much
any other online service."
Google's prominence and user base also magnifies a glitch when it
occurs, he says. On a random Tuesday, a software glitch with (Web
conferencing company) WebEx might be noticed by a couple thousand
people, but Germanow says Google's ubiquity make errors hard to hide.
"The scale of the number of people who will notice is much larger," he
says. "Google on a Saturday morning? You're looking a millions of
people."
There was also some confusion over the weekend as to whom was
responsible and to what extent StopBadware.org was involved in Google's
operations. Google issued several updates to the original statement
after StopBadware.org manager Maxim Weinstein posted a statement saying
Gooled erroneously stated that they gets the list of badware-infected
URLs from the Harvard University-based organization.
He thanked Google for their efforts to correct the statement and
reaffirmed his support for the search engine giant and their efforts to
protect their customers. "The mistake in Google's initial statement,
indicating that we supply them with badware data, is a common
misperception," he wrote. "Despite today's glitch, we continue to
support Google's effort to proactively warn users of badware sites, and
our experience is that they are committed to doing so as accurately and
as fairly as possible."
Germanow says the types of errors that resulted in Saturday morning's Google freak-out will continue, as Web-based
software develops and matures. "The whole notion of software
development in a Web age is different from the classic enterprise
software development," he says. "The thing I would look for is not
necessarily, -This will never happen again,' but rather -How did they
respond?' Did they live and learn, or did they live and not learn?"
