Microsoft's BPOS Service Outage Illustrates Cloud Conundrum

Microsoft's well-publicized BPOS service outage illustrates how all cloud services experience downtime. But will that dissuade businesses from jumping to the cloud?

Customers of Microsoft's BPOS service last week found themselves cut off from email.

On May 10, malformed email traffic sparked a growing message backlog that impacted some customers for up to six to nine hours. The issue occurred again May 12, compounded by a separate but related problem that led to customer delays as long as three hours.

Then, just to top off what was already a stressful week for Microsoft's BPOS engineering teams, a failure in the Domain Name Service hosting stopped users from accessing Outlook Web Access hosted in the Americas. That issue also affected Microsoft Outlook and Microsoft Exchange ActiveSync devices.

Microsoft solved the issues and issued a mea culpa. "I'd like to apologize to you, our customers and partners, for the obvious inconveniences these issues caused," Dave Thompson, corporate vice president of Microsoft Online Services, wrote in a May 12 posting on the Microsoft Online Services Team Blog. "We know that email is a critical part of your business communication, and my team and I fully recognize our responsibility as your partner and service provider."

In the wake of the issues, Microsoft has taken steps to improve its communications with users. "Effective today, we updated our communications procedures to be more extensive and timely," Thompson wrote. "The primary mechanism for communicating to our customers on issues has been and will continue to be the Service Health Dashboard."

He also insisted that the issues gripping BPOS haven't affected Office 365, Microsoft's cloud-based productivity platform that recently launched its public beta, or other company services. (Office 365 is effectively the rebranding of BPOS.)

But the outages also raise some key questions about the cloud.

Microsoft is "all in" with regard to cloud services. Indeed, CEO Steve Ballmer and other executives have spent much of the past year taking every opportunity to tout the company's upcoming subscription platforms as the wave of its future. Office 365, Windows Azure and other platforms represent Microsoft's attempts to expand its revenue base beyond traditional, desktop-bound software such as Windows and Office.

Microsoft's cloud emphasis also allows it to compete with Google for large online contracts. Last October, Microsoft announced a partnership with New York City's government to provide municipal employees with access to cloud-based Microsoft applications, in what many saw as a response to Google's agreement with the City of Los Angeles to provide cloud services to its employees. The competition between the two companies has become so intense that Google even sued the federal government after the Department of the Interior allegedly denied its bid to update an email and messaging system-a $59 million, five-year contract that had gone to Microsoft's BPOS-Federal suite.

But while the cloud offers businesses some noted advantages-chief among them, removing the need to maintain on-site IT infrastructure-it also comes with certain risks. In April, an outage at Amazon Web Services led to service disruptions across the Internet, affecting popular Websites such as Reddit, Quora and Hootsuite.

The issues with Amazon led some companies to revert back to on-premises solutions. "We are currently setting up dedicated servers with hard-wired storage," wrote Andy Singleton, president of Assembla, a software development tools and services provider affected by the EC2 downing. Nonetheless, he touted the benefits of Amazon's cloud: "We recommend it because their truly on-demand server resources make it possible to rapidly try things, fix things and innovate. Innovation speed is important."

Amazon isn't alone in its outages. Google lost some of its users' email data in February, and launched an aggressive effort at restoration. The possibility of at least some downtime is baked into cloud contracts; the question is what happens with the outage is so catastrophic that it results in data loss, or delays so lengthy they cause a client to lose revenue. For most companies, including Microsoft, the response to an event like the one that hit BPOS last week is to issue some sort of credit for the cloud-time lost.

Even such well-publicized incidents, though, don't seem to be dissuading businesses as to the ultimate benefits of the cloud. "Clouds will have downtime-it's a fundamental issue," Andi Mann, chief cloud strategy guru at CA Technologies, told eWEEK. "But you need to be ready for downtime, whether it's your own infrastructure or cloud infrastructure. You need to understand what the risk is. It's all just about risk management."

In other words, the more businesses gravitate toward the cloud-and the more companies go "all in" on offering cloud services-the more well-publicized cloud incidents will occur. But with each incident, it seems that companies like Microsoft, Google and Amazon learn a little more what works and what doesn't-and take steps to improve their services that much more. Their future revenues depend on it.