Cloud outages are nothing new. Yet on Tuesday, Microsoft and its business customers discovered just how disruptive one can be when it affects email, which is often considered the lifeblood of modern business.
The Redmond, Wash.-based company suffered a black eye on June 24 after a service interruption affected its cloud-based Exchange Online service. Making matters worse, the hours-long outage took place during most of the workday for many U.S. customers.
“On Tuesday, June 24th, 2014, at approximately 6:30 a.m.EDT, some North American customers experienced email delays with Exchange Online. The issue has since been resolved, and the service is now functioning normally,” explained a Microsoft spokesperson in an email to eWEEK. “We sincerely apologize to our customers for any inconvenience this incident may have caused.”
The unexpected downtime shouldn’t dissuade businesses from adopting cloud services, according to TJ Keitt, a Forrester Research Senior Analyst. “As for its implications on business utilization of the Office 365 service, I don’t believe occasional service interruptions are cause, in and of themselves, for not going down the services route,” he told eWEEK.
Nonetheless, enterprise customers will want to go into service agreements with their eyes wide open, he said. “While vendors like Microsoft are providing highly available infrastructure, that should never be misconstrued as infallible infrastructure.
“I’ve said this in relation to Google outages and the same applies here: You should judge the vendor on how quickly it reports the outage, how soundly it makes the fix and how it communicates the problem and its resolution to clients and prospects,” advised Keitt.
Exchange Online is now back in service after it had been rendered inaccessible for at least 8 hours for some customers.
In the early morning hours of Tuesday, the company’s customers flocked to the Office 365 forums to report that their organizations were unable to send or receive email using Exchange Online or the Outlook Web App. The Azure-backed service, used mainly by businesses, provides practically the same email, calendar and contacts services and management capabilities as its namesake on-premises, server-based software.
David Zhang, of Microsoft Support, confirmed that the service failed at 8:30 a.m. EDT June 24.
“Microsoft has identified an issue in which a portion of capacity responsible for facilitating connectivity to the Exchange Online service has entered into a degraded state. Engineers are actively working on a solution to remediate impact,” quoted Zhang from a status page.
At 2:07 p.m. EDT June 24, the official Office 365 (@Office365) Twitter account posted the following:
“Some Exchange customers are experiencing email delays, we are working to resolve, please see the SHD for service status”
As the hours dragged on, irate customers reported a complete email shutdown across their entire organizations, long wait times on the customer support phone lines and service health dashboards that reported no issues while Exchange Online remained clearly inaccessible. For some customers, the situation amounted to rubbing salt on some very recent wounds.
A day earlier, on June 23, Lync Online also suffered an outage that began at approximately 8 a.m. EDT. Lync Online is the cloud-based version of Microsoft’s enterprise voice, video and instant messaging communications service. A status update from Microsoft read:
“Engineers have determined that a portion of network infrastructure which routes network traffic was in a degraded state. Engineers have shifted traffic to alternate capacity and are now checking service health.”
After 7 p.m. EDT, Edward Qu of Microsoft Support reported on the company’s support forum that the issue had finally been resolved. “Engineers continued analyzing network traffic to further isolate the source of the issue for the remaining affected customers, and identified an additional degraded portion of network infrastructure that was preventing connections from being routed successfully,” explained Qu.
Microsoft’s engineers were able to restore service after they “implemented a configuration change to bypass the affected network infrastructure to restore service,” added Qu.