Too Much Data?

Excess data can cloud business forecasts.

More information enables better decisions. Isnt that the premise behind almost every IT pitch? Yet adding data can actually worsen the decision-making process.

Understanding how this can happen and knowing how to avoid it is the kind of thing that makes someone a real CIO—instead of just being a head of IT with a CxO title on the door.

Small and midsize businesses are especially vulnerable to the more-IT-is-better mistake. SMBs already represent more than half of U.S. IT spending, according to a study last month by IDC, and their IT budgets are in the cross hairs of most vendors marketing campaigns.

But before a small company bulks up on IT, it needs to understand the paradox of adding data while subtracting information value.

The error is easy to see in the simplest cases. If you have two data values, such as sales volume at two different times, it may be useful to calculate their average and use it as an estimate of future performance. But if someone also notes the dates on which those sales were measured, fits a line to the two resulting points on a graph and proudly reports a perfect fit, the resulting trend has no statistical significance: Any two points will exactly determine a line, even if the actual behavior of the system is just a random fluctuation around some mean.

Adding data has turned an average that meant something into a "trend line" that could point anywhere.

You may think that Im being too conservative about the real-world use of statistics. If your sales last year were $5 million and your sales this year were $10 million, am I saying that its wrong to predict that next years sales will be $15 million?

Well, yes, I am, and Im not talking about trivial noise in that prediction. Im talking about the kind of error that can kill a company.

The actual behavior, out there in that real world, could be trial buys by customers who like the product and begin to buy more while also telling their friends. If I could determine that this was taking place, I might hope that my sales volumes happened to be on an exponential curve—at least, in the short run—so that next years potential sales could be $20 million.

If I wrongly predict a smaller value and cant meet that higher level of demand, my error will be a self-fulfilling prophecy—and Ill have a third point on my misleading trend line, encouraging me to make the same mistake the following year. Ill also be leaving unfulfilled demand that competitors can exploit to enter the market.

Alternatively, the actual behavior behind those first two years sales might be first-time buyers hating the product and swearing theyll never try it again. My sales growth from $5 million to $10 million, in that case, might be merely the result of advertising briefly outpacing bad word of mouth.

If my advertising budget reaches the same number of people next year, but there are three times as many unhappy customers loudly offsetting that message, my sales next year could plummet back down to $5 million or even less—and Ill never know what hit me.

These examples may seem too simple, but errors like these may well be buried under the deceptive complexity of an elaborate CRM or enterprise-forecasting project. These systems can generate lots of numbers and may give me the illusion of understanding more than I did before I had them.

The process is seductive because we want to understand whats going on, and throwing additional measures into the pot seems to make our models more certain. Not sometimes, not usually but almost always, since modeling algorithms use added variables in ways that improve the overall fit, unless theyre so perfectly random in behavior that the algorithm ignores them completely. Thats an unlikely result. Adding variables will improve most measures of fit but often without boosting actual predictive power.

Being an SMB means being close to your customer. Dont blow it by building a wall of numbers that merely blocks your view.

Technology Editor Peter Coffee can be reached at