Data Minus Knowledge

Knowledge and insight about customers haven't suddenly become a proposition of quantity rather than quality.

On June 23rd, the Defense Advanced Research Projects Agency will receive the first round of proposals for the development of "LifeLog," which it describes as a system that will "trace the threads of an individuals life in terms of events, states and relationships."

I imagine ears pricking up all over at the thought of this technology being developed at government expense, eventually to migrate into commercial applications. Web-based operations like already attempt to determine the forces that lead to purchase decisions by watching the paths that users take through their sites. Storage and analytics vendors enjoy the revenue that comes from enterprises accumulating terabytes of data, hoping to find the gold at the end of the click trail. And those concerned about electronic privacy will wonder if such developments will make it too easy to know too much.

Clearly, a growing fraction of our daily lives takes place via channels directly subject to digital capture. At the same time, technologies for recording what we do in the physical world are becoming continually more accurate and less costly. Its tempting to think that this leads toward the goal of building systems that actually understand us—but theres reason to doubt that vision and to steer clear of letting it infiltrate enterprise IT.

The language of DARPAs description of LifeLog reminds me of William Gibsons 1996 novel "Idoru," which pivots on one characters ability to analyze "nodal points" in collections of data: to develop insights into a persons life and behavior from residues of transactions, communications and activities. As is so often the case, what Gibson imagines in one decade becomes plausible reality in the next.

But Gibson, at least, knew enough to attribute his characters uncanny insights to an unexplainable talent: a side effect of trials of an experimental drug. Thats wetware, not software, and certainly not a brute-force approach based on collecting and analyzing everything. Quite the opposite, in fact: Its a skill of somehow knowing what to ignore.

Capturing click trails and accumulating masses of other raw data are at the other extreme: Its easy to explain what youre doing but hard to know what to do next. The problem reminds me of the paradox of the wandering ant: a poetic phrase that describes the result of asking people to analyze a diagram of an ants path from a source of food back to its nest. Its a complex path, and people tend to think that there must be a correspondingly complex strategy behind it.

Essentially identical paths can be generated, though, by simulations based on extremely simple strategies such as following the scent trails left by other ants and moving toward the point where they seem to converge. The key point here is that the complexity is in the environment, not in the individual, and that this truth is easily obscured—rather than being revealed—by collecting more details about each individual.

DARPA envisions a system that can read a persons e-mail, correlate a message with a related calendar entry, combine the result with GPS data and verify using surveillance video to produce a concise description of an episode such as "Smith took the 8:30 a.m. flight from Washingtons Reagan National Airport to Bostons Logan Airport." But this only describes what Smith did, with no clue to the goal that it served. To paraphrase Michael Porters classic comment on consumer behavior, its like saying "I bought a 3/8-inch drill bit" instead of "I needed a 3/8-inch hole."

Declining costs of data collection, storage and analysis form a seductive force that encourages us to hope that well understand more if we collect more. But knowledge of your customer, and insight into customers needs, has not suddenly become a proposition of quantity rather than quality.

DARPAs goals for LifeLog are only superficially similar to your goals in building a business intelligence system. Follow DARPAs example, and youll be able to draw on the masses of data that tomorrows technologies will allow you to collect—but unless that collection effort is guided by a creative vision of your business, the results will be either irrelevant or misleading. And it will be a futile exercise to try to apply your business vision after the fact to an indiscriminate archive.