Okay, so Amazon Web
Services, one of the largest, most secure and most trusted Web services in the
world, showed a vulnerability April 21 when it went down for hours,
taking some busy Websites with it. Thankfully, this doesn't happen every day,
or even every year.
So what else is new? No site
on the Internet is sacrosanct or immune from power shutoffs or a major
service-denial attack.
The outage
that started at 1:41 a.m. PDT April 21 at an AWS (Amazon Web Services) data
center in Northern Virginia caused service disruptions in its EC2 (Elastic Cloud 2)
hosting service, knocking thousands of Websites—including such popular ones as
Foursquare, Reddit, Quora and Hootsuite—off the Internet for more than 30
hours. Those and many smaller Websites were still offline in some systems by
mid-afternoon April 22.
Businesses that depend on
the AWS hosting service have lost money during these hours—income that cannot
be regained.
Is it too much of a snap
reaction to ask if the outage will cause CIOs to hesistate about upgrading
their IT systems with cloud-type deployments? Will IT execs question the cloud's
prime-time readiness for key business operations?
In a word: Nah.
There's no question that the
Amazon outage raises important points for enterprises to consider about which
services to subscribe to from a public cloud, which should remain on the organization's
physical premises, or which to deploy as private cloud services. But those are
questions that IT decision-makers grapple with every day.
"The first thing to
understand [about this event] is that this changes nothing," Andi Mann,
longtime storage industry analyst who's currently serving as chief cloud
strategy guru at CA Technologies, told eWEEK.
"Cloud will have
downtime—it's a fundamental issue. But you need to be ready for downtime,
whether it's your own infrastructure or cloud infrastructure. You need to
understand what the risk is. It's all just about risk management."
Additional Risks Always Involved
When you look at subscribing
to services in a public cloud like EC2, it's important to remember that there
are indeed additional risks involved, Mann said.
"When you move into the
cloud, you can't just take an old application and throw it in the cloud and
think you've done something special," Mann said, "because you're
introducing additional risks.
"You're sharing
infrastructure, you're relying more on networking, and you're relying on your
cloud provider to do things you used to do—like disaster recovery, continuity
planning, performance management, change management and so on."
If you're going to leave all
that to a service provider, then you're going to get into trouble, Mann said.
Each enterprise needs to manage its cloud systems as if they were in their own
data center, he said.
"You can't just throw
it to Amazon and say, 'I'm done.' You need to monitor it yourself, you need to
have a backup plan, you need to have a disaster-recovery plan, you need to
manage licensing," Mann said. "Cloud doesn't mean no management. It's
still virtually within your four walls."
Nothing Out of the Ordinary
The Amazon outage is certainly
nothing new or out of the ordinary.
"We've seen this many
times before. Gmail's been down, Amazon's been down before, many of the CDN's
[content delivery networks] have been down," Mann said. "Heck, CDNs
have been shut down. Look at the
Wikileaks thing, for example. Pastor Terry Jones [a controversial Florida
anti-Muslim preacher] was shut down. Look at Amazon's terms of service; they
can shut you down themselves.
"You might get shut
down [at any time] using the cloud. Just manage it."
The probable results of this
event are that Amazon will work harder to prove itself and add more safeguards.
Customers will look closer at paying extra for online backup in the form of
more "availability zones"—which means more mirrored content within the
cloud service.
But there's no stopping
outages like this one.
Lydia Leong of Gartner
Research wrote in an advisory that Amazon EC2 didn't actually violate its
service-level agreement when the outage occurred.
"Amazon’s SLA for EC2
is 99.95 percent for multi-AZ deployments," Leong wrote. "That means
that you should expect that you can have about 4.5 hours of total region
downtime each year without Amazon violating its SLA.
"Note, by the way, that
this outage does not actually violate their SLA. Their SLA defines unavailability
as a lack of external connectivity to EC2 instances, coupled with the inability
to provision working instances. In this case, EC2 was just fine by that
definition. It was Elastic Block Store [EBS] and Relational Database Service
[RDS] which weren’t, and neither of those services have SLAs."
So, one more important
admonition: Read the SLA very, very carefully when you commit to a cloud
service. What you don't understand, or don't realize, may come back to bite
you.
Just ask Quora, Reddit and a
few other online businesses.