Amazon, Windows Azure, GoDaddy: How to Avoid Similar Cloud Outages
You can probably name your core gear off the top of your head—maybe not all your less high-profile stuff, but certainly critical devices. In GoDaddy's case, a performance cascade turned a minor problem into a major outage. Use the discovery engine of your network monitor to ensure wide discovery with low manual configuration investment. Configure scheduled discovery to automatically detect new devices and assess how critical they are.
Though the cloud offers IT departments an amazing list of benefits, they are still technology platforms, managed by imperfect human beings. As a result, they are just as potentially error-prone as internal systems. Recent outages remind us all that, in practice, big IT offers only limited improvement in reliability, and in many cases may increase the effects of small human failures. The basic IT tasks of planning, careful change management and continuous monitoring must be universal. It seems even the big guys have had trouble—whether it's Amazon Web Services (AWS) putting a lump of coal in Netflix's stocking Christmas Eve with a botched elastic load balancing (ELB) maintenance job, United Airlines or Bank of America down from service configuration errors, or more recently the Microsoft Azure authentication outage because the company forgot to renew a cert. These incidents remind us that with a little preparation, in-house IT can deliver reliability on par or sometimes better than the big guys. Patrick Hubbard, senior technical product marketing manager and head geek at IT management software specialist SolarWinds, helped eWEEK come up with this list of tips for avoiding outages.