How the mighty have fallen. Starting in the 1980s, Japanese companies became legendary for quality-and none more legendary than Toyota. But earlier this year, Toyota led the news due to quality problems. The situation was so severe that Toyota CEO Akio Toyoda personally appeared at a Congressional hearing, during which he said, “We know that the problem is not software because we tested it.”
But is this a realistic way to think about software quality assurance? In fact, increasing indications (including reliable information from confidential sources in Japan) are that some of the problems were software-related.
So let’s look at the quality and testing lessons IT professionals can draw from Toyota’s debacle. Who knows, this article might help you to avoid a sweaty session in front of angry members of Congress. Let’s start with that quote from Toyoda because it’s so categorical-and so wrong:
“We know that the problem is not software because we tested it.”
Size can deceive. Consider bridges. The Sydney Harbour Bridge, the Golden Gate Bridge and the Tsing Ma Bridge are enormous structures. However, they are built of engineering materials that are well understood such as concrete, steel, stone and asphalt-all of which have well-defined engineering, physical and chemical properties.
Being physical objects, these bridges obey the laws of physics and chemistry, as do the materials that interact with them (air, water, rubber, pollution, salt and so forth). Further, we’ve been building bridges for thousands of years. We know how bridges behave and how they fail. Ironically enough, given some of the lessons in this article, our ability to use computers to design and simulate bridges has increased their reliability even further.
Size notwithstanding, a bridge is a simpler thing to test than a Toyota Prius. In the complex system of systems that controls the Prius, there are too many states, lines of code, data flows, use cases, sequences of events and transient failures from which to recover.
Consider this example: Engineers at Sun Microsystems told an associate of mine that the number of possible internal states in a single Solaris server is 10,000 times greater than the number of molecules in the universe. How long do you have to test?
Software Testing is Important to Quality
Software testing is important to quality
I have been involved in testing and quality for almost my whole 25+ year career. I know how important testing is to quality. It is a matter of poignant pride for me that, in the two lost Shuttle missions, software failure was not the cause-thanks to the legions of software testers who worked on the mission control center systems.
However, there’s less software involved in a shuttle mission than in driving your Prius to the grocery store. Seriously. Software for late 1970s and early 1980s hardware is orders of magnitude simpler and smaller than software for 2010-era computers. You could not run an iPhone or a Prius on the computers that run the shuttle.
And, even when computers were smaller and simpler, you could not exhaustively test the systems. Glenford J. Myers recognized this fact in The Art of Software Testing, the first book on software testing written in 1979. Whether testing cars or data centers, software testing is a necessary but insufficient means to quality.
Failures are not proportional to defect size
This brings us to the next lesson from Toyota, though it is by no means company or culturally specific. We have clients around the world. It is common across borders, companies and cultures for people to forget that complex systems can exhibit unpredictable, and in some cases, catastrophic, failures. It is also common for people to forget that failures are not proportional to the size of the defect.
To see examples, one can consult the Internet for the answers to four questions: Why did a SCUD missile evade the Patriot missiles and hit a troop barracks in the first Gulf War? Why did the first Ariane 5 rocket explode? Why did not one, but two, NASA Mars missions fail? Why did the Therac-25 kill cancer patients? In each instance, the answer is discouragingly simple: an infinitesimally small percentage of the code proved defective.
Again, size deceives. If you knock a rivet out of a bridge, does the bridge fall? No. If you nick a wire in a single suspension cable, does the bridge fall? No. If you carve your name in a facing stone on a pillar, does the bridge fall? No. Yet, some software fails for similarly small defects involving just a few lines of code.
Use Multiple Software Test Techniques
Use multiple software test techniques
So, what can we do? Well, first, remember that software testing cannot save us from this problem. However, there are many different software testing techniques. Each type of testing can expose a different set of defects. Testers must use different test techniques, test data and test environments for different bugs during different levels of testing. Each technique, each set of test data, each environment and each test level filters out a different set of bugs. There is no “one true way” to test software.
Now, I’m not saying that Toyota believes in a “one true way” to quality. Toyota learned quality management from Joseph Moses Juran and William Edwards Deming-heroes in the pantheon of quality. Juran and Deming knew much better than to believe in a single magic bullet for quality. However, as we saw from Toyoda’s comments, he did believe too much in testing. In addition, I suspect that Toyota, as a company, believed too little in integration testing-and perhaps too much in vendors.
Here’s the problem: When complex systems are built from many subsystems, and some of the subsystems are produced by vendors, risks can go up and accountability can go down. It’s not that vendors don’t care; it’s that they can’t always foresee how their subsystems will be used. It’s not that the people won’t take responsibility-though that happens. It’s that, when multiple subsystems are at fault, neither vendor wants to take all the blame.
So, understand, measure and manage the quality assurance process for such systems from end to end-including vendor subsystems. After all, the user drives just one car, not a dozen, and there is only one brand on the grill-and this is true for your data center, too, isn’t it?
Recap of Lessons Learned
Recap of lessons learned
So, let’s recap the three key lessons about software quality that Toyota’s woes can teach us as IT professionals:
Lesson No. 1: Testing is a necessary but insufficient part of quality management.
You cannot exhaustively test complex, real-world systems-from cars to data centers-so testing should be part of a much larger process for quality assurance.
Lesson No. 2: Even little mistakes in software can have big consequences, so test software at all levels, from individual units to system integration.
Complex systems can exhibit unpredictable and, in some cases, catastrophic failures that are not proportional to the size of the defect. So, many different types of testing should be employed to try to expose any defects.
Lesson No. 3: When complex systems are built from many subsystems, with some of the subsystems being produced by vendors, risks can go up and accountability can go down.
So be sure everything, including systems you buy from vendors, works with customers. Understand, measure and manage the quality assurance process for all systems from end to end.
If you’re wondering how to apply these lessons, check out my previous Knowledge Center articles, How to Build Quality Applications and How to Achieve Greater Application Interoperability in Your Data Center.
Rex Black is President of RBCS. Rex is also the immediate past president of the International Software Testing Qualifications Board and the American Software Testing Qualifications Board. Rex has published six books, which have sold over 50,000 copies, including Japanese, Chinese, Indian, Hebrew and Russian editions. Rex has written over thirty articles, presented hundreds of papers, workshops and seminars, and given over fifty speeches at conferences and events around the world. Rex may be reached at [email protected].