SHARE

Nine Common Reasons Cloud Systems Crash

Written By

Aug 13, 2014

3 minute read

eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Nine Common Reasons Cloud Systems Crash
Human Error
Application Bugs
Cloud Provider Downtime
Quality of Service
Extreme Spikes in Customer Demand
Security Breaches
Third-Party Service Failures
Storage Failures
Lack of Cloud Disaster Recovery Procedures

Nine Common Reasons Cloud Systems Crash

1 - Nine Common Reasons Cloud Systems Crash

by Chris Preimesberger

Human Error

2 - Human Error

This is by far the No. 1 cause for cloud downtime. Even with perfect applications, cloud environments are only as good as the people who manage them. This means ongoing maintenance, tweaking and updating must be worked into standard operational procedures. One bad maintenance script can—and will—bring down mission-critical applications.

Application Bugs

3 - Application Bugs

While the cloud does introduce a new level of complexity, application failure still trumps cloud provider issues as a leading cause for downtime. More often than not, such failures are unrelated to the cloud infrastructure running your applications. Traditional IT practices still apply, except that you are continuously developing, testing and deploying your application in the cloud.

Cloud Provider Downtime

4 - Cloud Provider Downtime

Cloud failures are routine. Whether it’s an instance, an availability zone or an entire region, applications should plan for these failures. This means routinely checking performance and spinning up new instances to replace terminated machines. Amazon Web Services, for one example, enables users to spread and load-balance an application across several availability zones so that when one does fail, the application does not suffer.

Quality of Service

5 - Quality of Service

As far as consumers are concerned, streaming videos that freeze up mean your cloud is not working. They don’t really care (or even know) that the application is technically speaking still running. That means accommodating for network latency, fluctuating demand and shifting customer requirements.

Extreme Spikes in Customer Demand

6 - Extreme Spikes in Customer Demand

This is actually a great example of cloud superiority. If customer demand exceeds capacity, there’s not much you can do with an on-premise IT infrastructure. In a public cloud environment, you can respond to fluctuations in customer demand by automatically scaling capacity during peaks and backing down when demand levels off.

Security Breaches

7 - Security Breaches

Security is often raised as a red flag when it comes to hosting critical applications in the public cloud. Much like on-premise environments, it’s up to you to comply with regulatory and security concerns. However, the cloud does make it easier to check off a list of security requirements, since cloud providers have addressed these concerns repeatedly with hundreds of enterprise customers.

Third-Party Service Failures

8 - Third-Party Service Failures

The whole is greater than the sum of its parts, but all it takes to bring your cloud down is one third-party app that isn’t working. This could happen to any type of infrastructure application (sustaining, garbage collecting, security and so on) in yours or another supplier’s data center. It’s up to you to continuously monitor these applications as well and have a contingency plan in place for a rainy day.

Storage Failures

9 - Storage Failures

In a recent disaster recovery survey, storage failure was listed as a top risk to system availability. The cloud still depends on physical storage, which routinely fails. Much like overall service availability and quality, storage issues can lead to serious performance issues. This means planning for these failures by setting up dedicated cloud storage applications that maintain data resiliency and meet data retrieval requirements.

Lack of Cloud Disaster Recovery Procedures

10 - Lack of Cloud Disaster Recovery Procedures

Although disaster recovery has been a common practice for decades in physical data centers, cloud DR only recently has come under scrutiny. Few realize that it’s the customers who are solely responsible for application availability. Cloud providers can help you develop failover and recovery procedures, but it’s up to you to integrate them into your applications.

Nine Common Reasons Cloud Systems Crash

Nine Common Reasons Cloud Systems Crash

Human Error

Application Bugs

Cloud Provider Downtime

Quality of Service

Extreme Spikes in Customer Demand

Security Breaches

Third-Party Service Failures

Storage Failures

Lack of Cloud Disaster Recovery Procedures

Chris Preimesberger

Company

Categories