For IBM fellow Rakesh Agrawal, modern database systems need to take a cue from the medical profession by adopting a trusted relationship between enterprises collecting data and the customers providing it that is similar to the one between physicians and patients.
Agrawal, who first stumbled on the idea for a new database design while discussing privacy with his brother, who is a physician, joined other IBM researchers recently in presenting a new concept called the Hippocratic database. The basis of the design is borrowed from a major tenet of the Hippocratic oath governing doctor-patient relationships that states, in part, “whatever I may see or hear … in the life of human beings … I will remain silent, holding such things to be unutterable.” In databases, that would translate to a design that better takes consumer privacy into account in the way it stores and retrieves information.
“More and more databases are keeping personal and private information, and we are sort of relying on databases for our day-to-day existence,” said Agrawal, lead scientist on the project at the IBM Almaden Research Center, in San Jose, Calif. “If we dont treat it with respect, people are going to get hurt.”
Hippocratic databases would negotiate the privacy of information exchanged by a consumer to companies. The database owner would have a policy built into the database about storage and retrieval of personal information, and the database donor would be able to accept or deny it. Each piece of data would have specifications of the database owners policies attached to it. The policy would specify the purpose for which information is collected, who can receive it, the length of time the data can be retained and the authorized users who can access it.
The increased ubiquity of the Internet and use of databases for data mining in marketing has led to the need for database systems that limit the type of data stored, how that data is used and how long it is stored, researchers say. At the same time, regulations such as the Health Insurance Portability and Accountability Act of 1996 and the Gramm-Leach-Bliley Act of 1999, along with tough European Union privacy laws, are forcing companies to take privacy more seriously.
The concept was well-received by researchers attending the Very Large Data Bases conference in Hong Kong last month, where it was presented.
“The big change is understanding that developers of databases need to think of these issues and provide enough hooks and methods so that privacy rules can be built into databases,” said Jignesh Patel, an assistant professor in the Department of Electrical Engineering and Computer Science at the University of Michigan, in Ann Arbor.
Phil Bernstein, a senior researcher at Microsoft Research, in Redmond, Wash., agreed with the concept of a Hippocratic database but said privacy cant stop there: It needs to extend beyond databases to areas such as applications, system engineering and XML protocols.
To better incorporate privacy awareness, database vendors will not need to make architectural shifts but will progressively need to add features in areas such as access control, encryption, managing and filtering queries, and logging access for later auditing in coming years, Bernstein said. For instance, Microsoft Corp., also in Redmond, in its next release of its SQL Server database, code-named Yukon, is planning more security features, such as row-level security, to more granularly control user access to information, he said.
IBM researchers have prototyped the Hippocratic database concept to work with the P3P (Platform for Privacy Preferences) standard from the World Wide Web Consortium, which helps determine the data a Web site can collect. P3P allows a Web site to encode its collection and use practices in XML in a way that can be compared with a users preferences. The standard itself doesnt include a way to enforce that a site follows its policy, but the prototype allows for the database to check whether the site owners and users preferences match.
With the Hippocratic database and its components, metadata tables would be defined for each type of information collected, IBM officials said. A Privacy Metadata Creator would generate the tables to determine who should have access to what data and how long that data should be stored. A Privacy Constraint Validator would check whether a sites privacy policies match a users preferences.
A Data Accuracy Analyzer would test the accuracy of the data being shared. Once queries are submitted along with their intended purpose, the Attribute Access Control would verify whether the query is accessing only those fields necessary for the querys purpose. Only records that match the querys purpose would be visible, thanks to the Record Access Control component. The Query Intrusion Detector then would run compliance tests on the results to detect any queries whose access pattern varies from the normal access pattern. In the final step, a Data Retention Manager would delete items stored beyond the length of their intended purpose.