A bug in the Windows version of the Skype client caused the software to crash after servers got overloaded, resulting in a two-day outage.
The widespread outage that crippled Skype on Dec. 22 appears
to have been aggravated by a bug in the Windows version of the VOIP client,
said Lars Rabbe, the company's CIO, on Dec. 29.
A "confluence of events," which included overloaded servers,
a bug in the Windows client and a shortage of supernodes caused Skype to be "unavailable"
to users, Rabbe wrote on the Skype blog
Supernodes help manage multiple connections on Skype's
peer-to-peer network, said Rabbe. Each end user's computer running Skype's client
software is a node and connects to other computers to create a p2p network.
Supernodes acts like a directory by supporting multiple computers running the
Skype software and establishing connections between them, he said. Supernodes
typically create local clusters of several hundred nodes, according to Rabbe. Any
computer running the client on a public network can be tapped to act as a supernode.
The Skype outage
began when a cluster of servers that handle
offline instant messaging got overloaded with too many requests, wrote Rabbe.
Users running Windows client version 126.96.36.199 received delayed responses from
the overloaded servers and could not process the information properly, causing
the clients to crash. Enough clients crashed to affect the p2p network Skype
relies on to connect users, Rabbe wrote.
About half of all Skype users globally were running that
version of the Skype client for Windows and about 40 percent of those clients crashed,
said Rabbe. About 25 percent to 30 percent of the public supernodes that
connected users ran the buggy version of the client, and failed, he said.
Users running the latest version, 188.8.131.52 or the older 4.0
versions, Skype for Mac, Skype for iPhone, Skype for TV, Skype Connect, and
Skype Manager for enterprises didn't have any problems processing data from the
overloaded servers and didn't crash, he said.
Even though Skype's team disabled overloaded servers, "a
significant number" of supernodes had already failed, forcing a
"disproportionate load" on the remaining supernodes to keep the p2p network up.
"Even when restarted, it takes some time to become available" as a supernode
again, Rabbe said.
Timing was also important. The clients crashed just before
peak hours and all the clients trying to reconnect
resulted in traffic "100
times what would normally be expected at that time of day," according to Rabbe.
Rabbe did not address what caused the offline instant
messaging servers to overload in the first place. He said Skype will be
reviewing processes for providing automatic updates to users so that everyone
will always be on the latest client software.
The outage lasted approximately 24 hours, from mid-day Dec.
22 to mid-day Dec. 23. While not a complete shutdown, the outage affected a
vast majority of the millions of users who use Skype each day. Early on Dec.
23, Skype said about 5 million of its users were back online, which was 30
percent of normal activity. The company has said it will offer refunds to affected users
Skype engineers were able to restore the network by
introducing thousands of new, dedicated supernodes to the network and diverting
resources normally used for other services, wrote Rabbe. The supernodes were
stabilized by Dec. 24 and Skype restored all services, including Group Video
Calling, "in time for Christmas," he wrote.