Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Subscribe
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Subscribe
    Home Cloud
    • Cloud
    • Cybersecurity
    • Database
    • Storage

    Amazon Cloud Outage Caused by Storms, Worsened by Software Glitches

    Written by

    Jeff Burt
    Published July 3, 2012
    Share
    Facebook
    Twitter
    Linkedin

      eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

      Amazon Web Services€™ lengthy outage over the weekend after powerful storms through the Mid-Atlantic region knocked out power has put renewed focus on the risks of cloud computing and how users can minimize those risks going forward.

      It also has generated promises from Amazon at a time when competition in the booming infrastructure as a service (IaaS) space is growing, as the likes of Google€”which in June unveiled its Compute Engine€”look to make inroads.

      The storms that blew through the Mid-Atlantic region June 29 hit Virginia particularly hard, knocking out power to hundreds of thousands of people. The storms also cut off power to one of Amazon Web Services€™ (AWS) 10 data centers on the East Coast, a situation that was exacerbated by problems with backup power at the Virginia facility as well as unexpected software issues that arose during the recovery efforts.

      The outage at the data center knocked out such technologies as Amazon€™s Elastic Compute Cloud (EC2); interrupted such high-profile Websites as Netflix, Instagram and Pinterest; and impacted other companies that run all or part of their businesses on the Amazon compute cloud. The outage hit June 29 in the afternoon ET, impacting such services as EC2, Elastic Block Storage (EBS) and Relational Database Service (RDS).

      However, according to a July 2 analysis from Amazon, the situation created by the power outage was made worse due to problems with power supplies and software bugs. While several data centers in AWS€™ U.S. East-1 region saw power fluctuations, two data centers were hit with a large voltage spike. One data center switched to generator power as planned, but there were problems getting and keeping backup power going in the second one. As a result, for more than an hour the night of June 29, users could not create new instances in EC2.

      Amazon officials said that the outage affected about 7 percent of EC2 instances and EBS volumes, though they admitted that €œthere was significant impact to many customers.€

      Complicating matters were problems with what AWS call €œcontrol planes,€ which caused problems for customers trying to respond to the service outage and manage their resources in the cloud environment. During the outage, there was a large number of reboot requests from customers, which caused a bottleneck in the server booting process. In addition, there were problems with the elastic load balancers (ELBs), which are designed to switch traffic to other unaffected areas in case of such situations. When the power was restored, €œa large number of ELBs came up in a state which triggered a bug we hadn€™t seen before. The bug caused the ELB control plane to attempt to scale these ELBs to larger ELB instance sizes.€

      The result was a flood of requests that combined with customers launching new EC2 instances, all of which conspired to create an ELB control plane backlog €œand pretty soon, these requests started taking a very long time to complete,€ Amazon said.

      The problems also reached AWS€™ relational database service (RDS) in the impacted data center, which couldn€™t be restored until the EBS came back up. In addition, another software bug meant that there was no automatic failover to an unaffected area.

      AWS officials have promised to fix these problems, including expanding the number of engineering staff on-site to ensure that, if there is another outage, they can switch power to generators€”manually, if necessary€”before the uninterrupted power supplies (UPSes) run out of power, improving the recovery process and dealing with blockages that forced assessment and failover for the control plane to be done manually rather than automatically.

      For AWS, one of the pioneers in IaaS, getting this right will be important. The company suffered through a large service interruption last year, and already has gone through smaller ones in recent months, and other Web companies are looking to get in on the action, which is rapidly gaining adoption as businesses see the advantages of not having to invest a lot of money in creating their own infrastructures. Instead, they can essentially run their businesses in the cloud, on someone else€™s servers and storage arrays, and spend their money elsewhere, including product development and hiring staff.

      Google is among the latest Web companies pitching their cloud services. In June, the company launched its Compute Engine, a cloud service that currently is available in limited preview. Google executives said during their Google I/O developers conference that the company has the massive computing capabilities within its data centers to host applications.

      The outage also generated a host of blogs and articles€”several found here€”outlining for AWS customers ways to avoid problems in the future when service outages occur.

      Jeff Burt
      Jeff Burt
      Jeffrey Burt has been with eWEEK since 2000, covering an array of areas that includes servers, networking, PCs, processors, converged infrastructure, unified communications and the Internet of things.

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      MOST POPULAR ARTICLES

      Artificial Intelligence

      9 Best AI 3D Generators You Need...

      Sam Rinko - June 25, 2024 0
      AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
      Read more
      Cloud

      RingCentral Expands Its Collaboration Platform

      Zeus Kerravala - November 22, 2023 0
      RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
      Read more
      Artificial Intelligence

      8 Best AI Data Analytics Software &...

      Aminu Abdullahi - January 18, 2024 0
      Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
      Read more
      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Video

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2024 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.