CBS “Numbers”—or, strictly speaking, “Numb3rs”—is one of the few TV series my family makes an effort to watch or tape each week. Yes, we do know some of the technical advisers for the series; no, my wife and I almost never get through an episode without some shared eye-rolling, as necessary liberties with the mathematical plot devices do spark the occasional groan.
We know its only a 1-hour episodic drama, not a semester course, and we give the network credit for trying. Sometimes, moreover, it all works quite well.
The least implausible math hooks in the plot lines of “Numb3rs” are those that are based on the use of data mining techniques, with the math-whiz lead character finding clusters of connection among attributes that would seem unrelated to most human brains.
Dont get me started on the subject of minimally significant data-set sizes, which are probably the single biggest hole in the typical “Numb3rs” script. People would get tired of stories that always involved a prolonged spree of crimes, enough to make statistical assumptions meaningful. Ive often seen a recommended minimum of 33 cases, but then the show would have to be retitled “Numb33rs.”
Higher math issues aside, though, Id like to think that people watching “Numb3rs” will eventually start to look at their own environments in a more data-driven way—although the expression “data-driven” has at least two meanings with substantially different outcomes.
Some software systems are called “data-driven” because their flow of control cant be determined by looking at the code of the running program. No simple flowcharts here—a software environment such as OPS5 describes a problem-solving strategy, awaiting the arrival of facts that match up with rules and trigger associated actions. Those actions change the pool of known facts, and thus trigger new rule firings; eventually, something useful happens, but slightly different data arrivals may yield wildly different results. Its hard to test such things.
Other software systems are called “data-driven” because a data structure, not a piece of executable code, is the main definer of what the program will do. Rather than a program consisting of dozens of conditional branches, with their well-known opportunities for error, a program might consist of a much simpler piece of logic that knows how to navigate a table of conditions and actions. To cover a new situation, all one has to do is add the appropriate information to the table.
I think of HTML authoring as a form of data-driven programming: I dont write code that directs the machine to set up a graphics coordinate system or render line-wrapped text; I just create a data structure that a rendering engine treats as input. I had the same reaction to my first encounter with Autodesks AutoCAD: Instead of needing to write graphics code to produce a shape on the screen, I could write a much simpler program that generated a drawing file for the CAD engine to interpret and display.
Programs controlled by data structures have some major advantages over programs whose code may mix assumptions with logic, often in ways that make the programs hard to maintain. If much of a programs detailed behavior is controlled by data, the top-level program is likely to be simpler and easier to test—as well as being less frequently changed, and therefore less often in need of testing.
Data maintenance, meanwhile, is something that we know how to do, with notions of privilege and validation that are much more robust than many software-testing environments.
Meanwhile, Ill continue my own search for non-obvious connections—a search that you can observe at blog.eweek.com, where I and other eWEEK staff members are always offering highlights and associations that weve noted in the stream of daily tech news.
We know that there are many different types of moments in your day when different kinds of news and analyses are convenient to consume, and were interested in your comments and suggestions as to how we can be most useful.
Peter Coffee can be reached at [email protected]