In the hands of a government preoccupied with security, data mining technologies raise the specter of bureaucratic intrusiveness of Orwellian proportions. But data mining researchers in the private sector say that this technology, like any tool, can be used for good, and they have grown weary of seeing it held out as a bogeyman.
Researchers from academia and industry gathered here late Tuesday to discuss the future of the technology at the KDD (Knowledge Discovery and Data mining) conference sponsored by the Association of Computing Machinery. With the recent spate of bad publicity regarding government data mining initiatives, KDD researchers are eager to champion the technologys potential as a strategic tool for businesses.
A panel of Ph.D.s Tuesday afternoon floated ideas for future commercial applications in the fields of security and fraud detection, e-commerce and bioinformatics, but they offered few details on when or how these applications would become available. Nonetheless, some see it as a growth industry in light of recent advances in storage technologies, which have created new ways to warehouse vast amounts of data cheaply.
Also a boon to the technology is Microsoft Corp.s decision to include a data mining capability in SQL servers. Other companies are taking a different approach, focusing on honing the technology for business use.
General Motors Corp., for its part, is working on ways to turn data mining into a strategic enterprise tool, according to Ramasamy Uthurusamy, a researcher at GM. Citing the recent reported success of Harrahs Entertainment Inc. in examining customer data to increase return business through new customer incentives, Uthurusamy said some businesses are already availing themselves of the technology.
Despite the optimism, the researchers acknowledged numerous challenges—both technological and cultural—that must be addressed before data mining becomes a widespread and useful enterprise tool.
One of the main obstacles is managing mined data and keeping sight of the purpose for collecting it. Usama Fayyad, chairman and co-founder of Revenue Science Inc., said he makes a “mess” whenever working on a data mining project, leaving “a trail of droppings thats of Biblical proportions.” Within two to three days of initiating a project, it becomes easy to lose sight of the purpose and goals, he said.
Another major obstacle, in Fayyads view, is a disconnect between the way data is represented in data stores and the way mining technologies work. Revenue Science, which changed its name in June from digiMine Inc., is based in Belleview, Wash.
The publics concern about protecting privacy remains a key obstacle, and researchers increasingly see it as a problem that they must address themselves. Calling privacy a “do-or-die” issue, Rakesh Agrawal, a researcher at IBM Almaden Research Center near San Francisco, said that developers must accept responsibility for the technology they are creating. Additionally, new methods must be developed to safeguard against false positives, or inaccurate patterns in data, and there remains much work to be done in that field, he said.
Despite their recognition of the hurdles that lie ahead for data mining, proponents largely remain optimistic about its long-term prospects.
“Headlines have given data mining a bad name,” quipped Gregory Piatetsky-Shapiro, president of KD Nuggets, noting recent news reports of a U.S. Senate vote to block funding for the Pentagons Terrorism Information Awareness project. “But even if the Senate bans data mining, remember we will still have knowledge discovery.”