Its Time to Decide

Developing online algorithms brings many benefits.

Weve all worked with people who refuse to begin thinking about the answer to a question until they have all the facts; they want to be left alone while they think, rather than accepting new information as it arrives. But if I may paraphrase management guru Peter Drucker, its a terrible handicap to insist on waiting to make a decision until one has complete information. The most effective decision makers are those who can do the right thing, more often than not, despite confusing and inadequate data.

Its time to apply Druckers criterion to IT—not only to the decision making of IT managers but also to the algorithms that animate enterprise systems.

In everyday programming practice, most discussions of algorithm choice assume that the ingredients are ready: that the data set has been constructed and that its

time to start the machine. Today, though, our most interesting systems gather data from many locations via network links of varying speed and quality. So the greatest leverage in many critical tasks—for example, job scheduling and transaction risk assessment—may come from knowing when to stop waiting and get on with the job, even if the input is not yet all ready for processing.

Those who study this subject refer to these techniques as "online algorithms," meaning that they work with data coming in on a live connection rather than collecting all the data that they require and then ignoring everything else while they run.

There are many things wrong with the "go away, Im thinking" approach. For one thing, it deprives an interactive user of the chance to reconsider a question without first waiting for a no-longer-wanted answer. Ive had many users complain to me, for example, that when they mistype a database query or a Web URL, they waste far too much time waiting for the system to give up, when what theyd much rather do is interrupt with the input that they actually meant to provide. If theres one good reason for even the simplest applications to be multithreaded, its the value of having one thread always attending to user input, while other threads feed that input to the back end for distribution and processing.

Readers of Douglas Adams "Hitchhiker" novels will recognize an extreme case of "leave me alone" design in the fictional computer Deep Thought, which demanded an uninterrupted processing period of 7.5 million years to determine the ultimate answer to life, the universe and everything—only to announce that the result was "forty-two." It might have been nice to get intermediate results from the early stages of the analysis: If the computer had reported, after a million years or so, that the answer was "somewhere between 32 and 64," the people waiting for the final answer might have appreciated the resulting chance to rethink their approach.

When developers look at algorithms from an online perspective, it changes their thinking about what it means to refine their code. Normally, a developer looks at the time required from the moment data is ready to the time the final answer is produced. The online perspective starts measuring elapsed time earlier, when the question is asked or some other trigger event takes place, but stops the clock as soon as a good-enough answer becomes available—typically, when no amount of additional precision would change the users decision.

How do developers optimize online algorithms? It cant be done with other-than-real-world data or in other-than-real-world conditions. Only real-world data will show the patterns of where the variability really lies, making certain data much more important to update; only real-world conditions will show the patterns of which data takes the longest to arrive, making its estimation most worth the effort.

This means that application developers who take an online view are forced to become intimately familiar with the business problem and the business environment—to become enterprise problem solvers rather than computer scientists who happen to be on retainer to a corporate IT department. I dont need more data to know Im OK with that.

Discuss this in the eWEEK forum.

Peter Coffees e-mail address is