Final Thoughts on the Five-Day AWS Outage

News Analysis: Amazon waves the all-clear flag, but there still are a lot of smoldering IT managers who haven't yet cooled off completely from the outage.

Five full days after its largest outage hit on the morning of April 21, Amazon Web Services said it finally has restored virtually all services to its customers.

However, there still are a lot of smoldering IT managers who haven't yet cooled off completely from the outage that started at 1:41 a.m. PDT April 21 at the AWS data center in Northern Virginia.

The mishap caused disruptions in its EC2 (Elastic Compute Cloud) hosting service, knocking thousands of Websites-including such popular ones as Foursquare, Reddit, Quora and Hootsuite-off the Internet. A limited number of customers still were reporting data being "stuck" in its EBS (Elastic Block Storage) service on April 25.

Income that AWS-hosted businesses lost during that one- to five-day window of time will never be regained. This was a serious business problem for hundreds, perhaps thousands of IT managers, who are now wondering whether to continue using the service.

"EBS is now operating normally for all APIs and recovered EBS volumes," Amazon reported April 25 on its status dashboard. "The vast majority of affected volumes have now been recovered. We're in the process of contacting a limited number of customers who have EBS volumes that have not yet recovered and will continue to work hard on restoring these remaining volumes." The company said it will post a detailed incident report.

What are industry people saying in the wake of the mishap? What might be the long- and short-term results of an outage that shackled one of the sturdiest, most trusted Web services providers in the world?

Reaction from Far and Wide

Several AWS users commented with frustration on eWEEK stories covering the mishap. The blogosphere, as one might imagine, was rife with commentary.

"In short, if your systems failed in the Amazon cloud this week, it wasn't Amazon's fault," blogged O'Reilly Media's George Reese. "You either deemed an outage of this nature an acceptable risk or you failed to design for Amazon's cloud computing model. The strength of cloud computing is that it puts control over application availability in the hands of the application developer and not in the hands of your IT staff, data center limitations, or a managed services provider.

"The AWS outage highlighted the fact that, in the cloud, you control your SLA in the cloud-not AWS."

Morphlabs was one of the first AWS solution providers when it launched Morph Appspace in 2007 and now has more than 4,000 users.

"The Amazon EC2 outage has sent ripples and shockwaves through the AP wires and blogosphere, but those of us who have been in the cloud computing trenches for the equivalent of tech eons (at Morphlabs, we've been at it for more than four years), the news is neither shocking nor a reason to stray from our mission," founder and CEO Winston Damarillo told eWEEK.

"While it is tempting to unleash common fears about new technologies when confronted with 'proof' of their failings and risks, our years of innovation and adoption tell us that there is a wiser path. Approach with caution, but approach nonetheless. The same is true for the implementation of cloud computing services in your IT organization."

Morphlabs' approach to software development assumes failure, and it builds fault tolerance into all of its cloud computing solutions, Damarillo said.

Chris Preimesberger

Chris J. Preimesberger

Chris J. Preimesberger is Editor-in-Chief of eWEEK and responsible for all the publication's coverage. In his 15 years and more than 4,000 articles at eWEEK, he has distinguished himself in reporting...