Google Apologizes for Monday's Gmail Delays

By Todd R. Weiss  |  Posted 2013-09-25

Google Apologizes for Monday's Gmail Delays

Google's Gmail service has apologized to users who were affected by email delivery delays on Sept. 23, explaining in a blog post that the slowdowns were caused by a rare two-pronged failure in the company's network architecture.

"On Sept. 23rd, many Gmail users received an unwelcome surprise: some of their messages were arriving slowly, and some of their attachments were unavailable," wrote Sabrina Farmer, the senior site reliability engineering manager for Gmail, in a Sept. 24 post on the Google Gmail Blog. "We'd like to start by apologizing—we realize that our users rely on Gmail to be always available and always fast, and for several hours we didn't deliver. We have analyzed what happened, and we'll tell you about it below. In addition, we're taking several steps to prevent a recurrence."

What caused the problems, she wrote, was "a dual network failure" that occurred when two separate, redundant network paths both stopped working at the same time. The events were "unrelated," wrote Farmer, "but in combination they reduced Gmail's capacity to deliver messages to users, and beginning at [8:54 a.m. ET] messages started piling up."

An automated monitoring system quickly alerted the Gmail engineering team, which began investigating the incident, she wrote. Repairs got under way, and much of the accumulated message backlog was cleared up and delivered by 4 p.m. ET, with the rest of the delayed mail being delivered by shortly before 7 p.m. ET, according to Farmer's post. The service delays could be monitored by users on Google's application performance status page.

"The impact on users' Gmail experience varied widely," she wrote. "Most messages were unaffected—71 percent of messages had no delay, and of the remaining 29 percent, the average delivery delay was just 2.6 seconds. However, about 1.5 percent of messages were delayed more than two hours."

Postings about the problems were seen frequently throughout the day on Facebook, Twitter and other social media platforms.

With the latest service problem now fixed, Farmer wrote that the company is implementing some changes quickly to ensure that a similar problem is prevented in the future. "What's next? Our top priority is ensuring that Gmail users get the experience they expect: fast, highly-available email, anytime they want it. We're taking steps to ensure that there is sufficient network capacity, including backup capacity for Gmail, even in the event of a rare dual network failure. We also plan to make changes to make Gmail message delivery more resilient to a network capacity shortfall in the unlikely event that one occurs in the future."

In addition, Google is "updating our internal practices so that we can more quickly and effectively respond to network issues," she wrote. "We'll be working on all of these improvements and more over the next few weeks—even including this event, Gmail remains well above 99.9 percent available, and we intend to keep it that way!"

Asked about this week's Gmail problems, two IT analysts told eWEEK that such glitches are essentially unavoidable for these kinds of free services that are provided by the company.

"They have redundancy, but they are still a publicly controlled company," said Dan Maycock of Slalom Consulting. "They can't do triple redundancy. It would not be cost effective."

Google Apologizes for Monday's Gmail Delays

In reaction to the email delays, Google responded quickly and set to work on the problems, said Maycock. "I think they have the right controls in place. I think they could put better controls in place but it's not economically feasible. This was a once in a blue moon sort of thing."

A positive step that came out of the situation, he said, is that Google has vowed to prevent similar issues in the future. "They did say they plan to make changes in order to make Gmail more resilient, and that they are taking it more seriously and are going to improve it."

Rob Enderle, principal analyst at Enderle Group, told eWEEK in an email reply that Gmail "is a free service and free services are managed as cost centers, which means they are provisioned cheaply and Google is known for being incredibly frugal. The fact that they had a dual failure (primary and redundant systems) suggests the redundant system just couldn't handle the load when the primary failed and that one, likely both, of the network paths were inadequate. Users need to realize that if they are getting their email for free, not only is it being scanned but it likely will be under resourced and more likely to experience failures."

In the long run, wrote Enderle, "Google is managing to a bottom line and isn't really motivated to heavily fund this service since extra expenditures just reduce their bottom line."

Google is different from many other companies in that it doesn't typically use its free services as entry points into paid services that generate revenue, he wrote. "I expect they will have more outages as a result," wrote Enderle. "In the end, you mostly get what you pay for. If you want cheap, don't expect great service or reliability, and with Google especially, don't expect privacy either."

The company is constantly adding new features and services to its Gmail offering. In July, Google returned its outbound voice calling services to its Hangouts feature in Gmail, Google+ and through the Chrome browser extension after it was temporarily removed in May when Hangouts was updated. The missing voice calling feature cropped up quickly after the new Hangouts launch when user complaints posted on Google's blogs and Google+ pages caused the company to respond and promise the reintroduction of the service in the future.

Also in May, Google unveiled a feature that allowed users who have Google Drive, Gmail and Google+ Photo accounts to put all their files in a unified place, rather than having to maintain separate storage areas depending on what kinds of files were being stored.

Google also recently gave Gmail users the ability to send money to others by sending "cash" in an email message. The new capability became possible because Google integrated its Google Wallet payment services with Gmail, allowing users to safely and securely send up to $10,000 per transaction to another person.

Google's Gmail turned 9 years old in April, having started on April 1, 2004.

Rocket Fuel