Google Apologizes for Monday's Gmail Delays
The Gmail slowdown problems, which affected many users for some 7 to 10 hours, were caused by an uncommon "dual network failure," claims Google, vowing to prevent similar issues in the future.Google's Gmail service has apologized to users who were affected by email delivery delays on Sept. 23, explaining in a blog post that the slowdowns were caused by a rare two-pronged failure in the company's network architecture. "On Sept. 23rd, many Gmail users received an unwelcome surprise: some of their messages were arriving slowly, and some of their attachments were unavailable," wrote Sabrina Farmer, the senior site reliability engineering manager for Gmail, in a Sept. 24 post on the Google Gmail Blog. "We'd like to start by apologizing—we realize that our users rely on Gmail to be always available and always fast, and for several hours we didn't deliver. We have analyzed what happened, and we'll tell you about it below. In addition, we're taking several steps to prevent a recurrence." What caused the problems, she wrote, was "a dual network failure" that occurred when two separate, redundant network paths both stopped working at the same time. The events were "unrelated," wrote Farmer, "but in combination they reduced Gmail's capacity to deliver messages to users, and beginning at [8:54 a.m. ET] messages started piling up." An automated monitoring system quickly alerted the Gmail engineering team, which began investigating the incident, she wrote. Repairs got under way, and much of the accumulated message backlog was cleared up and delivered by 4 p.m. ET, with the rest of the delayed mail being delivered by shortly before 7 p.m. ET, according to Farmer's post. The service delays could be monitored by users on Google's application performance status page.
"The impact on users' Gmail experience varied widely," she wrote. "Most messages were unaffected—71 percent of messages had no delay, and of the remaining 29 percent, the average delivery delay was just 2.6 seconds. However, about 1.5 percent of messages were delayed more than two hours."