Nine Common Reasons Cloud Systems Crash

By Chris Preimesberger  |  Posted 2014-08-13 Print this article Print

It's not discussed and written about as much as it probably should be, but cloud applications crash at least as often as their on-premise counterparts. It's simply a fact of IT life: Any application running anywhere—whether on a physical server or in a virtual machine halfway across the world—can crash at any time. While minimizing downtime remains a key IT challenge, cloud computing disasters require a slightly different approach. Increased flexibility does mean that IT administrators have to give up a degree of control. When disaster does strike, there's no physical data center you can visit to investigate the problem—or at least it's usually not close enough to access quickly. Those who prepare accordingly will find that cloud applications generally perform remarkably well. This slide show, developed using eWEEK reporting and industry information from Ofer Gadish, CEO of CloudEndure, examines nine key reasons the best cloud applications crash—and offers suggestions about what you can do about them.

  • Nine Common Reasons Cloud Systems Crash

    by Chris Preimesberger
    1 - Nine Common Reasons Cloud Systems Crash
  • Human Error

    This is by far the No. 1 cause for cloud downtime. Even with perfect applications, cloud environments are only as good as the people who manage them. This means ongoing maintenance, tweaking and updating must be worked into standard operational procedures. One bad maintenance script can—and will—bring down mission-critical applications.
    2 - Human Error
  • Application Bugs

    While the cloud does introduce a new level of complexity, application failure still trumps cloud provider issues as a leading cause for downtime. More often than not, such failures are unrelated to the cloud infrastructure running your applications. Traditional IT practices still apply, except that you are continuously developing, testing and deploying your application in the cloud.
    3 - Application Bugs
  • Cloud Provider Downtime

    Cloud failures are routine. Whether it's an instance, an availability zone or an entire region, applications should plan for these failures. This means routinely checking performance and spinning up new instances to replace terminated machines. Amazon Web Services, for one example, enables users to spread and load-balance an application across several availability zones so that when one does fail, the application does not suffer.
    4 - Cloud Provider Downtime
  • Quality of Service

    As far as consumers are concerned, streaming videos that freeze up mean your cloud is not working. They don't really care (or even know) that the application is technically speaking still running. That means accommodating for network latency, fluctuating demand and shifting customer requirements.
    5 - Quality of Service
  • Extreme Spikes in Customer Demand

    This is actually a great example of cloud superiority. If customer demand exceeds capacity, there's not much you can do with an on-premise IT infrastructure. In a public cloud environment, you can respond to fluctuations in customer demand by automatically scaling capacity during peaks and backing down when demand levels off.
    6 - Extreme Spikes in Customer Demand
  • Security Breaches

    Security is often raised as a red flag when it comes to hosting critical applications in the public cloud. Much like on-premise environments, it's up to you to comply with regulatory and security concerns. However, the cloud does make it easier to check off a list of security requirements, since cloud providers have addressed these concerns repeatedly with hundreds of enterprise customers.
    7 - Security Breaches
  • Third-Party Service Failures

    The whole is greater than the sum of its parts, but all it takes to bring your cloud down is one third-party app that isn't working. This could happen to any type of infrastructure application (sustaining, garbage collecting, security and so on) in yours or another supplier's data center. It's up to you to continuously monitor these applications as well and have a contingency plan in place for a rainy day.
    8 - Third-Party Service Failures
  • Storage Failures

    In a recent disaster recovery survey, storage failure was listed as a top risk to system availability. The cloud still depends on physical storage, which routinely fails. Much like overall service availability and quality, storage issues can lead to serious performance issues. This means planning for these failures by setting up dedicated cloud storage applications that maintain data resiliency and meet data retrieval requirements.
    9 - Storage Failures
  • Lack of Cloud Disaster Recovery Procedures

    Although disaster recovery has been a common practice for decades in physical data centers, cloud DR only recently has come under scrutiny. Few realize that it's the customers who are solely responsible for application availability. Cloud providers can help you develop failover and recovery procedures, but it's up to you to integrate them into your applications.
    10 - Lack of Cloud Disaster Recovery Procedures

Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters

Thanks for your registration, follow us on our social networks to keep up-to-date
Rocket Fuel