Close
  • Latest News
  • Artificial Intelligence
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Logo
  • Latest News
  • Artificial Intelligence
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Cloud
    • Cloud

    AWS Outage Demonstrates Need for Redundancy Even in the Cloud

    By
    Wayne Rash
    -
    March 1, 2017
    Share
    Facebook
    Twitter
    Linkedin
      amazon web services

      If there was ever any question about Amazon Web Services’ critical role in keeping commercial web sites running smoothly, that question was answered definitively on Feb. 28 when part of the company’s S3 storage service went down. That outage took out dozens of Web services operating by companies ranging from Apple to Zendesk.

      What frustrated many users is that Amazon’s AWS dashboard, which is supposed to report the operational condition of its web services, was reporting that everything was operating normally even when it clearly wasn’t. The reason for that is because the dashboard relies on Amazon’s S3 storage and was unable to receive updated information about the outage.

      AWS acknowledged that there was a problem and promised to keep customers updated. But the updates stopped coming in mid-afternoon. The last Tweet from the AWS team was, “For S3, we believe we understand root cause and are working hard at repairing. Future updates across all services will be on dashboard.” Earlier, the company had promised updates on Twitter.

      However, once the company got its S3 services running again in the Northern Virginia location where its data center is located, the Service Health Dashboard began reporting the conditions accurately.

      At that point the services located in that data center status reports indicated the problem was fixed. AWS reported at 2:19 p.m. that “between 9:37 AM and 1:57 p.m. PST we experienced elevated error rates for API Gateway requests in the US-EAST-1 Region when communicating with other AWS services. Deploying new APIs or modifications to existing APIs was also affected. The issue has been resolved and the service is operating normally.”

      A close examination of the dashboard indicates that some services at Amazon’s Northern Virginia location may still be marginal, but it appears that it’s pretty much operating normally otherwise.

      So what actually happened to the Amazon S3 services? The company hasn’t been very forthcoming, but its comments about elevated error rates for API Gateway requests suggest that the problem is infrastructure related, meaning it’s probably a router problem.

      But of course, that’s just a guess. But many of the recent mass outages of services such as airline reservation systems seem to boil down to router problems, so it’s reasonable to make that assumption. In addition, router updates are frequently the root cause of such problems. Amazon hasn’t said what the actual cause of the problem is, so it could be anything from a hacking attempt to a configuration problem. We just don’t know.

      One thing we do know is that AWS and its S3 service are part of the problem, but not because it’s unreliable. In fact, Amazon’s services have been so reliable that its customers have grown to depend on AWS probably more than they should. From the viewpoint of most customers, AWS simply never goes down, so they don’t feel a need to plan for an outage.

      Except of course, when it does. Then as we saw customers are left hanging with few updates and fewer explanations. But as annoying as the lack of explanations might be, what customers really needed is to get back to work. That requires some planning.

      The first stage in that planning has to be finding an alternate storage location for the items that you’re keeping in the S3 storage service. This could mean keeping backups in S3 storage in another region, or it could mean using another storage service entirely. That way, if the S3 storage goes down, you can seamlessly switch to the other service.

      Ideally, Amazon could offer redundant storage as a part of their S3 offering, so that if the service goes down as it did on Feb 28, data requests would be automatically routed to another site. A potential problem with that plan is if the redundancy depends on information also stored in AWS, so that when the region goes down, then so does the redundancy.

      But assuming that Amazon can avoid making that mistake, and I’m sure the company can, then it has a good way to protect their customers from making the same mistake of assuming that Amazon won’t ever go down.

      A even better approach is to assume that AWS and all of your other cloud services will go down and then plan your approach to handle that. In reality, such an assumption is a good security practice. Redundancy is important in making sure that your data is always available without fail.

      This is why state of the art data centers have redundant servers, redundant network routers and power. It’s also why they have more generators available to keep the data center running than they actually need.

      Some data centers go beyond that in their quest for reliability, even to the extent of having redundant chilled water reservoirs so that a loss of system coolant is unlikely. Having redundant data repositories is just part of making sure you can deliver the information your customers need.

      With AWS and its high level of reliability, it’s easy to forget such lessons, but they remain important.

      Wayne Rash
      https://www.eweek.com/author/wayne-rash/
      Wayne Rash is a freelance writer and editor with a 35-year history covering technology. He’s a frequent speaker on business, technology issues and enterprise computing. He is the author of five books, including his most recent, "Politics on the Nets." Rash is a former Executive Editor of eWEEK and a former analyst in the eWEEK Test Center. He was also an analyst in the InfoWorld Test Center and editor of InternetWeek. He's a retired naval officer, a former principal at American Management Systems and a long-time columnist for Byte Magazine.
      Get the Free Newsletter!
      Subscribe to Daily Tech Insider for top news, trends & analysis
      This email address is invalid.
      Get the Free Newsletter!
      Subscribe to Daily Tech Insider for top news, trends & analysis
      This email address is invalid.

      MOST POPULAR ARTICLES

      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Applications

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Cloud

      IGEL CEO Jed Ayres on Edge and...

      James Maguire - June 14, 2022 0
      I spoke with Jed Ayres, CEO of IGEL, about the endpoint sector, and an open source OS for the cloud; we also spoke about...
      Read more
      IT Management

      Intuit’s Nhung Ho on AI for the...

      James Maguire - May 13, 2022 0
      I spoke with Nhung Ho, Vice President of AI at Intuit, about adoption of AI in the small and medium-sized business market, and how...
      Read more
      Applications

      Kyndryl’s Nicolas Sekkaki on Handling AI and...

      James Maguire - November 9, 2022 0
      I spoke with Nicolas Sekkaki, Group Practice Leader for Applications, Data and AI at Kyndryl, about how companies can boost both their AI and...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2022 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×