Bad Input Bombs Your Program

Opinion: A simple "fuzzer" program shows that most Web browsers are easily crashed by malformed Web tags. Who'd have thought that Internet Explorer would be the most robust!

One of the famous arguments for open-source software, made in "The Cathedral and the Bazaar" by Eric S. Raymond, is, "Given enough eyeballs, all bugs are shallow."

The point is that open-source projects will have more people working on them and looking at the code, and therefore the chances that bugs will be recognized are greater. In the traditional centralized software company, where access to source code is restricted, great deliberate effort must be put into bug fixing, thus putting open source at an inherent advantage.

This world view is a truism to some people, conjecture to others. Id say its almost unprovable. But every now and then some evidence comes along that makes you wonder whether anyone was looking at the code in a supposedly well-scrutinized program.

The latest one is mangleme by Michal Zalewski. Its a brilliant, simple tool that does nothing but generate malformed HTML tags at pseudo-random. By running it as a CGI process and including a META REFRESH tag, you can have a browser automatically receive random erroneous input until the browser dies.

Turns out most browsers die quickly. Counter-intuitively (for most of us), Internet Explorer handled the dirty input well. "All browsers but Microsoft Internet Explorer kept crashing on a regular basis due to NULL pointer references, memory corruption, buffer overflows, sometimes memory exhaustion; taking several minutes on average to encounter a tag they couldnt parse," he said.

Zalewski goes on to speculate that Internet Explorer has been subjected to testing with a tool like this. Id make the same guess, but it is only a guess.

What excuse do the other browsers have? Zalewski lists, among those browsers that crash regularly, Mozilla, Netscape, Firefox, Opera, Lynx and Links. I only tested Firefox and Lynx. The former crashed; the latter locked up. Incidentally, the source for the "page" that locked up Lynx is a good example of the sort of dirty input this tool generates:

<FORM><TEXTAREA COLS=10000000000> <KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK

You can imagine how easy it is for a programmer to blow off edge cases that deal with input like this because, well, who would send something like this? Years ago that attitude might have gotten you paid and made the customer happy, but standards have changed. Programs have to be abuse-proof.

Next page: The security angle.