Oops, were sorry, says Skype in its explanation for why its network was down for about two days last week. In fact, I was having trouble with it on Day 3 as well, but its back up and running now.
Click here to read more about the recent Skype outage and how it was fixed.
The short version is that after Microsoft released its Patch Tuesday updates, several of which required reboots, very large numbers of Skype clients rebooted and then tried to log in, all at about the same time. The network couldnt handle this overload. Normally, the Skype network is “self-healing,” but the massive number of log-ins exposed a bug in this capability. To quote Skypes explanation:
“The high number of restarts affected Skypes network resources. This caused a flood of log-in requests, which, combined with the lack of peer-to-peer network resources, prompted a chain reaction that had a critical impact.“
A lot is known about how Skype works—heres a very technical analysis of it (PDF)—but the company doesnt advertise the details of its protocols. In regard to the outage, it appears to have been important that Skype is partly a peer-to-peer system, especially with respect to “supernodes,” which are special clients designated to relay control data, especially behind firewalls.
One of the theories about how Skype went down is that there were too many nodes trying to connect to too few supernodes. This user probably describes a typical scenario. The supernodes on his network were bombarded with log-on requests which he mistook for a DDoS (distributed denial of service) attack. Or was it a mistake? In effect, it was a DDOS attack.
And dont mistake Skypes explanatory message for a bug fix report. Nowhere does it say that Skype has fixed the problem, just that it has been identified. To quote the Skype Journal blog:
“Skype has not said:If the no-self-healing bug is completely understood (just that its been found)If the bug is repaired (just that its been found)If the network collapse will not (cannot?) recurThe recovery may have been spontaneous; Skype hasnt posted anything to the contrary, nothing that says “We fixed the problem.”“
So Skype may have the same problem next month, and its not just Skype. Who else could be hit by similar problems? Its not hard to imagine other networks going down on Patch Tuesday for similar reasons. Lots of software checks for updates at boot time when programs load.
Naturally, some are taking the opportunity to blame Microsoft, but I dont see for what. Even if Microsoft released one patch each month that required a reboot, the problem would be the same. The message is clear: If you have a large number of Windows clients, you have to test for massive reboot scenario better than Skype did.
Security Center Editor Larry Seltzer has worked in and written about the computer industry since 1983. He can be reached at larryseltzer@ziffdavis.com.
Check out eWEEK.coms Security Center for the latest security news, reviews and analysis. And for insights on security coverage around the Web, take a look at eWEEKs Security Watch blog.