Google's Gmail application goes down for 100 minutes, keeping millions of users all over the world of the messaging and collaboration application in the dark. The culprit was human error accompanied by insufficient router capacity to handle the Gmail requests. Google is improving its service routers to prevent this from happening again, but the situation is still likely a sour taste in the mouths of Gmail users, some of whom pay $50 per month for the app and other Google Apps.Google's Gmail application was knocked out for the majority of users for 100 minutes on
Sept. 1 due to human error, the company said last night after the Gmail
engineering team fixed the issue.
Google took a small fraction of Gmail's servers offline to perform
routine upgrades. Google does this regularly, sending traffic to other
locations when one is offline. That's when things got hairy, as Ben Treynor,
vice president of engineering and site reliability czar for Google, explained:
"We had slightly underestimated the load which some recent changes
(ironically, some designed to improve service availability) placed on the
request routersservers which direct web queries to the appropriate Gmail
server for response. At about 12:30 pm
Pacific a few of the request routers became overloaded and in effect told the
rest of the system "stop sending us traffic, we're too slow!" This
transferred the load onto the remaining request routers, causing a few more of
them to also become overloaded, and within minutes nearly all of the request
routers were overloaded. As a result, people couldn't access Gmail via the Web
interface because their requests couldn't be routed to a Gmail server."
Through internal monitors, Treynor said the Gmail engineering team was
alerted to the failures within seconds and added several request routers online
to make up for the dearth in capacity and distributed the traffic across the
request routers. Gmail came back online around 2:30
p.m. PDT.
To ensure this lack of server capacitywhich is ironic considering that
Google allegedly powers the world's most popular search engine with more than 1
million serversdoesn't happen again, Google boosted request router capacity
well beyond peak demand for extra juice when the application needs it.
Treynor also said Google is improving the failure isolation in the routers,
so a problem in one data center won't affect servers in another facility.
Moreover, he said that Google is taking steps to make sure that when the
request routers are overloaded simultaneously, they all should just get slower
instead of refusing to accept traffic and shifting their load to another data
center.
It's also worth noting that when Gmail did go down, Google urged users to
access it via the IMAP and POP mail
protocols; mail processing continued to work normally because these requests
don't use the same routers at Google.
"We know how many people rely on Gmail for personal and professional
communications, and we take it very seriously when there's a problem with the
service," Treynor added. "Thus, right up front, I'd like to apologize
to all of youtoday's outage was a Big Deal, and we're treating it as
such."
So are the Gmail users who use Gmail for their businesses. Donald told Google Watch: "I use G-Mail to run my CPA practice. This is a
serious (huge) problem."
Sergei added: "This is a huge problem and an outrage. I demand
immediate Gmail access. What is with those people?"
Indeed, more than 1.75 million businesses use Google Apps and some of them pay Google $50 per user, per year for the Google Apps collaboration suite, which boasts Gmail as its backbone. Users have little patience for a service that conks out on them,
particularly when they are paying for the extra reliability and security. Read
more about this on TechMeme here.
The latest issue follows a big outage in February, when Gmail went down for
two and a half hours due to "unexpected side effects of some new code." But
these last two issues were nothing compared with the August 2008 outage that took Gmail down for nearly
15 hours.
| | Reader Comments: Latest Google Gmail Outage Caused By Insufficient Router Capacity | | >>> Post your comment now!
| | poor headlineDoes your choice of headline show how little your editors know about the Internet? No routers (devices that direct traffic between IP networks)... Posted At: 09-08-09 By: stine | | | | | | first google. now msnmsn weather and many other msn services have been down now for quite some time today (friday, sept 4th)...so far no "service status" messages from... Posted At: 09-04-09 By: rroberto | | | | | | A user comment on this articleMy Exchange server requires very little of my time on an ongoing basis (changing the backup tapes is really my only weekly task with it). I'd also... Posted At: 09-04-09 By: EuginChiTown | | | | | | Fair Enough...Fair enough - nobody like an outage. Find quotes from 'downtime fanbois' is probably not reasonable considering tight deadlines, so I concede the... Posted At: 09-02-09 By: ChrisInAtlanta | | | | | | A user comment on this articleChris:
Thanks for the typo catch. I'd have offered a balancing comment, something along the lines of, "This Gmail outage rocks!" but was unable to... Posted At: 09-02-09 By: Clint Boulton | | | | | | Who is paying $50/month?According to:
http://www.google.com/apps/intl/en/business/index.html
the cost of Google Apps is $50/year, not per month.
Is this outage a big... Posted At: 09-02-09 By: ChrisInAtlanta | | | | | | >>> Post your comment now! | | | | | |
|
 |