Final Thoughts on the Five-Day AWS Outage (
Page 1 of 2 )
Five
full days after its largest outage hit on the morning of April 21, Amazon Web
Services said it finally has restored virtually all services to its customers.
However,
there still are a lot of smoldering IT managers who haven't yet cooled off completely
from the outage that started at 1:41 a.m. PDT April 21 at the AWS data center
in Northern Virginia.
The
mishap caused disruptions in its EC2 (Elastic Compute Cloud) hosting service,
knocking thousands of Websites—including such popular ones as Foursquare,
Reddit, Quora and Hootsuite—off the Internet. A limited number of customers
still were reporting data being "stuck" in its EBS (Elastic Block
Storage) service on April 25.
Income
that AWS-hosted businesses lost during that one- to five-day window of time
will never be regained. This was a serious business problem for hundreds,
perhaps thousands of IT managers, who are now wondering whether to continue
using the service.
"EBS
is now operating normally for all APIs and recovered EBS volumes," Amazon
reported April 25 on its status dashboard. "The vast majority of affected
volumes have now been recovered. We're in the process of contacting a limited
number of customers who have EBS volumes that have not yet recovered and will
continue to work hard on restoring these remaining volumes." The company
said it will post a detailed incident report.
What
are industry people saying in the wake of the mishap? What might be the long-
and short-term results of an outage that shackled one of the sturdiest, most
trusted Web services providers in the world?
Reaction from Far and Wide
Several
AWS users commented with frustration on eWEEK stories covering the mishap. The
blogosphere, as one might imagine, was rife with commentary.
"In
short, if your systems failed in the Amazon cloud this week, it wasn't Amazon's
fault," blogged O'Reilly Media's George Reese. "You either deemed an
outage of this nature an acceptable risk or you failed to design for Amazon's
cloud computing model. The strength of cloud computing is that it puts control
over application availability in the hands of the application developer and not
in the hands of your IT staff, data center limitations, or a managed services
provider.
"The
AWS outage highlighted the fact that, in the cloud, you control your SLA in the
cloud—not AWS."
Morphlabs
was one of the first AWS solution providers when it launched Morph Appspace in
2007 and now has more than 4,000 users.
"The
Amazon EC2 outage has sent ripples and shockwaves through the AP wires and
blogosphere, but those of us who have been in the cloud computing trenches for
the equivalent of tech eons (at Morphlabs, we've been at it for more than four
years), the news is neither shocking nor a reason to stray from our
mission," founder and CEO Winston Damarillo told eWEEK.
"While
it is tempting to unleash common fears about new technologies when confronted
with 'proof' of their failings and risks, our years of innovation and adoption
tell us that there is a wiser path. Approach with caution, but approach nonetheless.
The same is true for the implementation of cloud computing services in your IT
organization."
Morphlabs'
approach to software development assumes failure, and it builds fault tolerance
into all of its cloud computing solutions, Damarillo said.