IBM researchers are working on a new database design that takes consumer privacy into account in the way it stores and retrieves information.
IBM Fellow Rakesh Agrawal this week is presenting the idea, called a Hippocratic database, at the Very Large Data Base 2002 conference in Hong Kong. The design is based on the Hippocratic oath that serves as the basis of doctor-patient relationships. The concept occurred to Agrawal while being challenged by his brother, who is a doctor, about the inability of technology like databases to take individuals privacy concerns into account.
“More and more databases are keeping personal and private information, and we are sort of relying on databases for our day-to-day existence,” said Agrawal, lead scientist on the project at the IBM Almaden Research Center, in San Jose, Calif. “If we dont treat it with respect, people are going to get hurt.”
One tenet of the Hippocratic oath includes a statement on privacy that states, “… whatever I may see or hear … in the life of human beings … I will remain silent, holding such things to be unutterable.” The Hippocratic database concept hinges on this principle.
Hippocratic databases would negotiate the privacy of information exchanged between a consumer or individual and companies. The database owner would have a policy built into the database about storage and retrieval of personal information, and the database donor would be able to accept or deny it.
Each piece of data would have specifications of the database owners policies attached to it. The policy would specify the purpose for which information is collected, who can receive it, the length of time the data can be retained and those who are authorized to access it.
The increased ubiquity of the Internet and use of databases for data mining in marketing have led to the need for database systems that limit the type of data stored, how it is used and how long it is stored, researchers say. At the same time, regulations such as the Health Insurance Portability and Accountability Act of 1996 and the Gramm-Leach-Bliley Act of 1999, along with tough European Union privacy laws, are forcing companies to take privacy more seriously.
“Once companies start recognizing that this is going to be extremely important for the consumer and some companies start saying We respect your privacy, and we use databases that are Hippocratic, that might become a movement in itself and that might become a competitive advantage,” Agrawal said. “At this stage, Im sort of saying that we need to create technology, and I think market forces and legal forces will take care of it.”
Already, IBM researchers in their lab have prototyped the Hippocratic database concept to work with the Platform for Privacy Preferences standard from the World Wide Web Consortium, which helps determine the information a Web site can collect. P3P allows a Web site to encode its data collection and use practices in XML in a way that can be compared to a users preferences.
The standard itself doesnt include any way to enforce its policy, but the prototype allows for the database to programmatically check whether a site owners and a users preferences match, Agrawal said. Eventually this would also give the two parties the ability to negotiate the privacy policy terms, Agrawal said.
10 Guiding Principles
The architecture for the Hippocratic database concept is to be based on 10 guiding principles: purpose specification, consent, limited collection, limited use, limited disclosure, limited retention, accuracy, openness and compliance.
The Hippocratic database and its components would work in the following way, according to IBM officials. First, metadata tables would be defined for each type of information collected. A Privacy Metadata Creator would generate the tables to determine who should have access to what data and how long that data should be stored. A Privacy Constraint Validator would check whether a sites privacy policies match a users preferences, and once this is verified the data would be transmitted from the user to the database.
A Data Accuracy Analyzer would test the accuracy of the data being shared. Once queries are submitted along with their intended purpose, the Attribute Access Control would verify whether the query is accessing only those fields necessary for the querys purpose. Only records that match the queries purpose would be visible thanks to the Record Access Control component. The Query Intrusion Detector then would run compliance tests on the results to detect any queries whose access pattern varies from the normal access pattern.
In the final step, a Data Retention Manager would delete any items stored beyond the length of their intended purpose. Audit trails of queries also would be kept to allow for privacy audits and to guard the database from suspicion that it has been misused.
While IBM researchers are interested in eventually including the Hippocratic database concept into IBMs DB2 database, they also want to expand interest in the concept. Agrawal hopes the presentation of the concept will lead other vendors and university researchers to embrace and evolve it.
“I wanted the database community to become cognizant of the issues,” Agrawal said. “I personally think it will help if others participate in it.”