On Feb. 12, when the figure skating judging controversy erupted at the Winter Olympic Games, it would not have been a complete surprise to see Mike Corrigan, director of technology at MSNBC.com, perform a panic-stricken double-toe loop right off the top of the ski jump hill. Corrigan and MSNBC.com—which stepped in to take over operation of all three official Winter Olympics sites only last June after the demise of previous provider Quokka Sports Inc.—saw site traffic following the judging fiasco jump to 6.7 million daily visitors, nearly double MSNBC.coms norm. Can you say self-inflicted denial-of-service breakdown?
Fortunately for Corrigan, MSNBC.com, of Redmond, Wash., was able to weather the traffic onslaught but only because the joint venture of NBC and Microsoft Corp. conducted a rigorous six-month program of Web site and application performance testing prior to the opening ceremony.
Those tests, which involved modeling how visitors were likely to use the site and simulating the impact of different traffic patterns on performance, helped MSNBC.com meet soaring traffic without blindly spending money on new hardware. In fact, although traffic nearly tripled, MSNBC.com was able to get by with just a 40 percent increase in total server capacity, making up the difference with caching, load balancing and other architectural refinements.
"To do it in the time available and cost-effectively ... we needed to bring in testing of the site to know what the site could do and make it through the Olympics," said Corrigan, who had already seen what an unexpected traffic surge can do when, following the Sept. 11 terror attacks, hits on the MSNBC.com site nearly quadrupled, and the company was forced to remove some site features and rush to deploy caching. "We knew we had a high load coming and that failure wasnt an option," he said.
Through the first week of the Salt Lake City games, the Olympics sites—Olympics.com, SaltLake2002.com and NBCOlympics.com—were far from failures. Even with traffic initially peaking Feb. 12 with 6.7 million visitors and 260,000 concurrent users, the sites performed under pressure. Olympics.com and SaltLake2002.com, for instance, both averaged response times of 0.8 seconds and availability of 99.7 percent for U.S.-based visitors during the first week and a half of the Olympics, according to Web site performance tracking company Keynote Systems Inc., of San Mateo, Calif.
That compares with an average response time for an index of the top 40 U.S. sites of nearly 2 seconds, Keynote reported, and a 3-second response time for the official Olympics site run by Quokka at the 2000 Summer Games in Sydney, Australia.
Testing Early and Often
MSNBC.coms approach of testing early and often serves as an example of how to prepare a Web site for a high-profile, high-traffic event, experts say. Testing is critical not only to be sure sites wont fail under pressure but also to determine the right amount of money to invest upfront in new hardware and software, said Matthew Berk, an analyst at Jupiter Media Metrix Inc., in New York.
"The bad approach would be [that] when theres more traffic than you know what to do with, you just throw servers at it and do limited stress testing," Berk said. "The right way ... is to build appropriately so youre not overspending."
To design and conduct the tests and simulations, MSNBC.com worked with Lab Acquisition Corp., which does business as KeyLabs, of Lindon, Utah. MSNBC.com officials said they decided to outsource the testing because of the tight deadline and because the company did not have the internal expertise to conduct such a wide range of tests.
KeyLabs began last September by measuring how well MSNBC.coms infrastructure could scale to meet traffic that could be expected to double or even triple the 2 million to 3 million daily visitors MSNBC.com was receiving at the time and reach 300,000 concurrent users. To conduct those tests and others, KeyLabs used RadView Software Ltd.s WebLoad load testing software.
"It was such a large engagement that they couldnt guess and just throw hardware at it," said Mike Fahnert, president and chief operating officer at KeyLabs. "They wanted to fully understand how the software and infrastructure would perform."
Next, KeyLabs and MSNBC.com set out to better understand the kind of Web traffic load MSNBC.com would receive during the Olympics. KeyLabs in November set up a simulation in which 30,000 volunteer users spread throughout North and South America, Europe, and Asia logged on to a pre-production version of the site. Client software, which KeyLabs developed in partnership with distributed computing company United Devices Inc., sat on users computers and allowed KeyLabs to analyze performance and usage patterns from various access speeds and locations on the Internet.
From information generated by those tests, MSNBC.com and KeyLabs developed sophisticated user profiles to match the various ways the broad international audience of the Olympics sites would access and navigate the sites. By January, these boiled down to about six common scenarios of how a visitor would use the site, such as checking only results of a popular event, navigating news or participating in real-time voting about results, said Fahnert.
From these models and the earlier capacity tests, MSNBC.com decided to increase its server capacity by 40 percent, double its capacity for serving up online advertisements from partner MSN, added internal caching using CacheFlow Inc.s 700-series caching server, implemented global load balancing with F5 Networks Inc.s 3-DNS Controller and put the final touches on a previous plan to add a second data center, said Corrigan.
All these efforts culminated in January in a final set of simulation tests, many running through the night just weeks before the Olympics, to validate changes and the additional capacity. For example, one night of five tests helped refine the sites ability to handle large loads. They included tests on ramping up to 100,000 users, running 100,000 sustained users, hitting the online voting application with 2.5 million votes, experimenting with turning caching capabilities on and off, and varying load balancing, Corrigan said.
The Web site testing regime has paid off for MSNBC.com. Though the company declined to discuss the cost of developing or testing the sites, it was able to stay under its operating budget. Now the company is considering ways to incorporate more outside testing into its Web operations on a regular basis.
Thats good news for Corrigan, who, along with other IT managers at MSNBC.com, can expect big stories—and overwhelming traffic surges—to continue long after closing ceremonies in Salt Lake City.