Pervasive public networks and the explosion of network-facing applications and Web services have dragged enterprise development out of the back room and into the showroom.
Customers and supply chain partners are coming to rely on network applications to complete time-critical transactions; government and public safety agencies are incorporating Web services into their missions. In this environment, lack of adequate software testing could become “the new negligence.” The charter of the testing team must grow apace.
Application testing is traveling down the same path that has lately been followed by IT security. A combination of heightened awareness and regulatory mandates has transformed security from a “why fix the roof when its not raining?” cost to a recognized requirement of due diligence.
Application testing efforts may likewise obtain improved access to human and technical resources, and development team leaders may encounter fewer arguments when they seek to acquire state-of-the-art tools as the costs of application failure grow.
Redefining reasonableness
Cem Kaner, professor of software engineering at Florida Institute of Technology and director of Florida Techs Center for Software Testing Education & Research, has challenged enterprise development managers to consider the consequences of an application failure that results in someones death.
Its not difficult, Kaner asserts, to imagine a situation in which a single line of code turns out to be the proximate cause and in which that line turns out never to have been tested—despite the availability of tools to perform such tests. This could prove a classic setup for a claim of negligence against the developer and user of the application involved.
Kaners Web page, “Software Negligence and Testing Coverage” (www.kaner.com/coverage.htm), lists more than 100 types of coverage tests that a development team might need to perform—or perhaps wind up explaining why it did not. Some conceivable tests that might be ordered are obvious (and costly), but nonetheless inadequate—for example, “test every line of code.” Others are less obvious but possibly crucial, such as “vary the location of every file used by the program” or “verification against every regulation [Internal Revenue Service, Securities and Exchange Commission, Food and Drug Administration, and so on] that applies to the data or procedures of the program.”
Auditing user manuals and help files, confirming copyright permissions for images and sounds, and reviewing multicultural acceptability of all text and graphics in an application are other items on Kaners list that may not immediately occur to a software testing team. However, any of them could affect end-user acceptance of an application or the marketplace response to its deployment.
And these are merely the kinds of tests that ensure the application was constructed as intended and breaks no rules in the process. An application could survive rigorous review on these criteria, yet still be unsatisfactory.
An application could, for example, correctly implement the wrong algorithm, calculating interest using beginning-of-period formulas when end-of-period formulas are needed, or computing year-to-date values based on a calendar year instead of an intended fiscal year. It could differ from the behavior of an earlier version, not in a way that makes the new version wrong but in a way that breaks an existing application-integration or data-sharing scheme. It could fail under abnormal loads or fail to deal gracefully with intermittent network connections. These are some of the domain-specific or dynamic aspects of application testing that todays development teams must address.
Finally, application designers who deploy on public networks must anticipate the nonrandom, carefully targeted and frighteningly well-informed disruptions of a deliberate attack. We explored issues of design and development for application-level security in the Dec. 13 Developer Solutions, but security testing involves additional challenges.
Next Page: Only reasonable testing is necessary.
Only reasonable testing is
necessary”>
If there is a silver lining to this cloud, it is in Kaners counterchallenge to the negligence-lawsuit scenario given above. Its a formally provable statement that exhaustive testing is not merely impractical but also a theoretical impossibility. And negligence, Kaner notes, is not the failure to do the impossible or even the failure to do everything that is possible, but rather the failure to do whats reasonable.
Developers should therefore understand that cost-benefit calculations can make a good case against a negligence claim, but only if the costs of testing and the benefits of risk reduction can be shown to have been evaluated with at least some degree of rigor and good faith.
A criterion of reasonableness is, moreover, a mixed blessing. It means that a development team cannot approach testing as a mechanical or a mathematical exercise, something with a formulaic criterion for how much is enough. A team must instead develop a process and a management approach to test the right things in a consistent and conscientious way.
Doing it automatically
In addition to the types of tests already mentioned, the vocabulary of testing includes long lists of familiar and tedious tasks with (sometimes literally) colorful names.
“White box” (or “glass box”) testing includes path testing, a form of coverage testing that attempts to traverse every possible path through an application. This becomes increasingly difficult as applications evolve into constellations of services developed and maintained by independent teams. “Black box” testing ignores internals and exercises only published interfaces to an application component, but this depends on a degree of completeness in software specification thats rarely encountered in any but the most critical domains.
“Basis path” testing uses knowledge of internals to generate test cases in a formal way; “monkey” testing (or “ad hoc” testing, for the more polite) merely exercises the functions of an application in a random manner.
All these methods represent different combinations of efficiency and reproducibility. Formally generated and reproducible tests might seem to be the gold standard, but they can be fools gold if theyre so time-consuming to generate and run that they arent used early and often during development.
By the time that attempts at exhaustive testing have anything informative to say, it may be too late for their results to be useful. A team is likely to be better served by earlier and less formal tests that are guided by expert experience as to where an applications problems are most likely to be found. This is a strong argument against the common practice of staffing a testing group with relatively inexperienced developers or with the less skilled members of a development team. The most effective tests are likely to come from developers with the most insight into what kinds of errors are most likely and least acceptable.
Regardless of testing and staffing doctrine, however, it does seem logical that the testing of computer applications should itself be streamlined by making it a programmable and thus repeatable task. “Test automation” is thus often taken to mean the development of scripts and other mechanisms for testing one piece of software with another.
Next Page: Automated testing has problems of its own.
Automated testing has problems
of its own”>
Its important to realize that this is not the only approach or even necessarily the best approach. Automated tests, after all, are themselves pieces of software that can exhibit their own flaws of poor usability. A well-maintained archive of the tests performed, results obtained and resulting improvements made can be easier to understand than a cryptic body of test scripts that may have been made obsolete by relatively tiny changes in a body of code.
Moreover, when automated tests are run by someone who didnt design them, they may be executed in ways that fail to catch errors. For example, a test might not be applied at boundary conditions, or changes in an application might alter those boundary values. Detecting crucial boundary conditions—and generating tests that focus on these likely failure points—is one of the notable strengths of a state-of-the-art testing tool such as Agitator from Agitar Software Inc.
Alternatively, an automated test might generate huge numbers of false-positive alerts. For example, a simple screen-replay tool for a GUI may trip over cosmetic changes in interface layout. Its a virtual certainty that a test that generates false positives will somehow be bypassed or suppressed, perhaps leading future test teams to assume that something is being tested when its effectively been shoved below the radar.
Automated testing may also produce accurate but misleading statistics. For example, a test might report that a certain fraction of the lines in a program were exercised a certain number of times, while failing to measure—and therefore being unable to report—whether those multiple tests actually verified behavior in more than one situation. Its up to a development team to ensure that tools are being used in a way that reflects this distinction.
Its also essential in the Web-deployed environment to test an applications handling of errors that may be triggered only by its dependencies on remote resources. This is one of the strengths of Compuware Corp.s DevPartner Fault Simulator, now in beta testing and planned for release early this year .
Test automation can also pave the way toward confirmation that an application does what its supposed to do, while leaving a massive blind spot obscuring things that it should not do.
For example, its easy for an automated test to ensure that changes to a persons insurance record are correctly applied. Its possible that an automated test would also ensure that those changes are reflected, where appropriate, in the record of a persons spouse.
What few such tests will check, however, is whether changes have been applied in places where they should not be. For example, a persons having a new child implies that the persons spouse also has a new child, but it would be an error to infer that the children already in that family are also new parents.
Such bugs are easily overlooked, warned software testing consultant Brian Marick, of Testing Foundations, in his 1997 paper “Classic Testing Mistakes” .
And those mistakes, already classics many years ago, remain all too likely to occur today.
Failure to think about what should not happen is also the essential flaw that opens the door to so many security problems in applications. Developers are good at envisioning and testing for all the ways that an application should behave correctly and for the many complex logic paths and other interactions that it should be able to follow. They tend to be less adept at envisioning things that should not happen—or that should be prevented if someone tries to make them happen.
Logging of applications actions can be an effective means of surfacing behaviors that an alert developer will recognize as out of line, but the code that does that logging can itself be time-consuming to write. A tool such as Identify Software Ltd.s AppSight 5.5, released in November, can perform that kind of recording in an intelligent manner that captures more detail when unusual situations indicate the need .
Its not enough to agree that testing is important. Unless the right things are tested in an effective way, software testing is like medieval medicine: Debugging the code, while ignoring fundamental flaws of design, is akin to bleeding a patient while failing to recognize (let alone cure) an infection.
Testing tools cant automate experienced vision or a domain-specific sense of whats important, but they can free developers from the most routine and laborious aspects of application testing to give them time to put their expertise to work.
Technology Editor Peter Coffee can be reached at peter_coffee@ziffdavis.com.
Application Testing Creates a Lengthening List of Demands
From choice of personnel to the design of test scenarios, teams must take arms against the rising costs of application failure
* Dont relegate testing to inexperienced team members
• Many crucial errors require domain knowledge to recognize
• Exhaustive testing is impossible; experience improves focus
* Dont stop testing when the application works
• Its not enough to do everything right; apps also must do nothing wrong
• Security problems and database corruption result when actions arent limited
* Dont stop at the applications edge
• Web-based applications need end-to-end stress tests
• Performance, compatibility and tolerance of network errors are also key criteria