Internet Stress Testing

The engineers have written the last line of application code. The database is ready to go. The designers have put the finishing touches on the front end. The marketing people are out there hyping your new Web site like theres no tomorrow. All thats left to do is flip the switch and watch the traffic come flooding in. Except … do you know just how much of that oncoming traffic youre ready to handle?

Its extremely tempting to believe that there is no such thing as too much Web traffic. After all, traffic is the Web sites raison dêtre; it is profoundly counterintuitive to worry about being too popular or to face too much demand for your product. Unlike other forms of popular mass media, however, Web sites have limited traffic capacity and break when forced to exceed that capacity. An audience of thousands witnessing first-hand the painfully slow response times and server crashes that accompany a traffic overload, moreover, can quickly turn a popular Web site into a laughingstock. Web surfers are a fickle lot and will quickly take their business elsewhere.

To be sure, detecting and identifying potential traffic-load problems cannot be done on the drawing board. “People assume that because theyve used best-of-breed products, because theyve done their application development with scalability in mind, that load isnt an issue,” says Simon Berman, director of product marketing at Web performance vendor Mercury Interactive. “They deploy without testing and suddenly realize that their Web site can only handle 15 percent of their expected capacity. The fact is, every Web site is a unique combination of hardware and software, which is going to require tuning and tweaking before it works the way its supposed to. Like it or not, that requires significant testing.”

Whats the Plan? According to Geoffrey Bessin, senior technical marketing engineer at Rational Software, “Lack of planning is the single biggest and most common mistake in the performance testing process. Testing is often seen as nonessential and is usually one of the first things to be squeezed out of the development cycle. Even if you somehow manage to throw together a useful test as an afterthought, what are you going to do with the results?”

The key rule in constructing a plan is “test early, test often”; there is no need to wait for a fully functional Web site to begin trial runs. Products like Empirixs Bean-Test and Rational Softwares QualityArchitect permit testing of individual java beans and COM components, while almost any load simulator can be tailored for limited site prototypes.

When it comes to fixing performance problems, timing can be everything. An issue with a particular application component, for example, may require some rewriting if discovered during the coding process or may call for substantial backtracking and re-architecture of an entire site if discovered just prior to launch. Whats more, distributing the testing process throughout the development cycle allows individual tests to focus on specific concerns rather than rely on a single trial run to catch every possible problem.

The results of performance testing are often difficult to interpret and rarely confirm expectations; testing is, after all, supposed to expose the unpredictable. As such, a good plan must include sufficient time and resources for both analysis of test data and addressing the issues uncovered. “We regularly get calls Wednesday night to test sites scheduled to launch Friday morning,” says J.D. Brisk, VP of Exodus Performance Labs. “They are obviously looking for a confirmation of what they want to hear. Most of the time, though, they end up maxing out a small fraction of their expected capacity, and at this point theres precious little they can do about it.”

A plan also should take into consideration the strengths and weaknesses of in-house testing products and outsourced testing services. Commercial off-the-shelf software packages provide a great deal of flexibility for timing, repeating, and tightly tailoring tests for a specific environment and timetable. Those considerations are particularly significant early in the development process. Services, on the other hand, provide relatively inexpensive access to expertise and infrastructure necessary for realistic traffic simulations and may be particularly helpful late in the development and deployment cycle.

Finally, a plan should include provisions for performance testing and monitoring after deployment. Most Web sites undergo continual development and updating, which can have unexpected performance implications. “Load testing never ends. It just shifts from simulated users to real ones,” according to Steve Caplow of Empirix.

The core functionality of a Web performance testing product or service is the ability to realistically simulate a large number of actual users interactions with a Web site. Without such realism, a simulation may demonstrate that a site can be overloaded without stipulating the conditions that cause that to happen or what can be done to prevent it. Realism, however, can prove surprisingly difficult to achieve.

The degree of realism necessary depends on the type of testing under way. Performance tests generally fall into one of two categories. “Stress tests” are designed to find a Web sites breaking points: the absolute maximum number of users and/or transactions that any given element of site functionality can handle without collapsing. While such tests focus more on power than realism—it is extremely unlikely, for example, that more than a handful of users will ever submit the same form or download the same file at precisely the same moment—they require an accurate model of the specific interaction being examined.

“Load tests,” on the other hand, attempt to predict site performance under expected real-life traffic conditions. These simulations require the reproduction of every factor that impacts upon site traffic, and can be influenced by a wide variety of minutiae.

The trickiest part of constructing a realistic simulation is modeling the end user. Within the performance testing suite, the user is represented by a script, which defines a sequence of interactions between the user and the site in question. Most software packages offer a relatively simple “recording” option that allows the tester to create scripts by navigating the site as if he or she were the typical user and saving the session for later playback. Even the simplest of recordings requires significant modification before it can be used as the basis for a load test, in order to create a degree of variation in the simulated user base and respond to differences in dynamically generated content. Different software suites vary widely in the power, flexibility and learning curve of their scripting tools.

No matter how many variables it contains, a single script is usually incapable of simulating the entirety of a Web sites expected user base. Different types of users interact with a site in drastically different ways, depending on their technical proficiency, familiarity with the site and reason for visiting. “A 15-year-old casual browser is going to look very different from a 35-year-old businessman or a 70-year-old retiree,” says Rational Softwares Bessin. “Theyll have different think times, navigation paths, browser and platform mixes, and connection speeds. Identifying the different classes of users to expect, and predicting their behavior, is a critical task.”

A realistic test also must take into account expected traffic patterns, both in terms of total load and the mix of different user types. Some sites, for example, might expect to see a series of plateaus, as users in different time zones arrive at work or come home from school. Some sites should anticipate traffic to come in steady streams, others in sudden peaks. Perhaps one class of users likes to log on first thing in the morning, and another group checks in as the last thing in the afternoon.

Even if youve got your user base modeled down to the tiniest detail, there are a number of technical issues that can interfere with your tests realism. For example, most testing products market their ability to make the most of available hardware by simulating large numbers of simultaneous user connections on a single box. A single machine simulating one hundred users is unlikely to place the same load on a site as two machines each simulating 50, however. Most machines will continue to add new connections long after theyve exhausted their processing power; beyond a certain point, each new simulated user slows down script execution for each simulated user, resulting in little or no additional load on the Web site.

Location of the simulated client boxes also can have a significant impact on test results. According to Ralph Decker, chief architect at Exodus Performance Labs, most performance bottlenecks sit on the perimeter of the LAN, arising from border routers, firewalls, load balancers or simple limitations on outbound bandwidth. In-house tests—which are often conducted entirely within the LAN—may miss such issues entirely. Several testing services offer geographically distributed client sites in order to reflect the potential impact of routing differences; Exodus and startup Porivo Technologies have even gone so far as to begin development of massively distributed testing systems based on the Seti@home model.

“These apps live on the Web, so they should be tested on the Web. The best way to achieve realism is to replicate real-life conditions in every way humanly possible,” says Ronnie Ray, VP of marketing at testing service provider Atesto Technologies.

Make the Most of What Youve Got A realistic test will generate accurate performance data but wont tell you which data to collect or what to do with it. It doesnt do much good to learn that your Web site crashes at 50 percent of expected capacity, or that performance drops off steeply after 25 percent; you need to know why, and even more importantly, what to do about it. Different testing products and services vary widely when it comes to data collection, presentation and analysis, so make sure that youre getting your moneys worth.

Every testing suite and service collects and analyzes data from the client machines. That largely amounts to tracking various response and download times under different conditions, though there is substantial variation in the available granularity among different solutions. There is an enormous difference between recording a pages download time and separately tracking the download times of each interface element, graphic and dynamically generated content object. Given sufficient granularity, this data can act as a powerful diagnostic tool; localizing performance problems to specific transactions or sets of transactions, for example, can prove invaluable in localizing a cause. By itself, however, this data will diagnose only a very limited set of ills.

Various Web site components generate a large amount of useful data in the course of a test—stored in those logs is everything you could ever possibly want to know about your sites performance. The trick lies in sifting through and organizing that enormous mass of data. That can prove as simple as correlating processor utilization with Web traffic; a single overloaded Web server in a bank of underloaded Web servers, for example, is a clear indication of a load-balancing issue. Problems arising from server tuning, application coding or database architecture can prove much more difficult to diagnose and may require megabytes of log files. Keep an eye out for automated server data collection and correlation functions in different testing solutions; these can make your job much easier.

Making the Grade An effective testing solution can make the difference between a rock-solid Web site and a technical quagmire with no good way out. Faced with insufficient, inaccurate or overly cryptic data, most developers attempt to solve performance and scalability problems by throwing hardware at them. Since this approach almost always fails to address the cause of the problem, it results in a technical and financial sinkhole. Well-managed testing regimes often can result in four- to eightfold increases in traffic capacity; several testing services go so far as to guarantee performance improvements.

As Empirixs Caplow puts it, “Every site is load-tested sooner or later; the key lies in planning and controlling the process. If your first test comes after your site launches, youre going to be the one feeling the stress.”

Internet Stress Testing

eWEEK EDITORS

Company

Categories