Open-Source Software to Aid Cancer Researchers
Records of cancer patients nationwide may soon be networked for researchers to access, and now a new study has found a way to de-identify individual records, making the project more feasible.
De-identifying electronic medical records so they can be used for research purposes is a must for health care institutions anxious to remain compliant with HIPAA (Health Insurance Portability and Accountability Act). But this is often a costly and labor-intensive process.
In the study conducted by Harvard University researchers on an open-source software program designed to scrub individual information from patient records, 19 identifiers were removed from every patients record including patient, institution and physician names as well as addresses, dates and medical record number.
The study focused specifically on pathology reports, in response to a project by the National Cancer Institute. This project has successfully demonstrated a prototype of a Web-based, searchable, peer-to-peer network for identifying and locating pathologic tissue samples at various institutions by searching information contained within pathology reports.
The intent of the project is to create and demonstrate software that would then automate the de-identification of the patient records that are likely to be available via the National Cancer Institute network.
Existing "scrubbing" software is either proprietary or only offers partial solutions by removing only one type of patient information. After creating and refining the software, 1,800 new pathology reports were processed.
Each report in the Harvard study was reviewed manually before and after de-identification to catalog all identifiers and note those that were not removed.
About seven out of 10 of these reports contained identifiers in the body of the report totaling 3,499 individual identifiers.
Of these, the program successfully removed more than 98 percent of them. Only 19 HIPAA-specified identifiers, mainly consult accession numbers and misspelled names, were missed.
Of the 41 non-HIPAA identifiers missed, the majority were partial institutional addresses and ages. Outside consultation case reports typically contained several identifiers and were reportedly the most challenging to de-identify comprehensively.
There was variation in performance across the three institutions that participated in the study and the researchers argued that this highlights the need for site-specific customization, which can be accomplished with their tool.
A PDF version of the full report is available here for download.
Check out eWEEK.coms for the latest news, views and analysis of technologys impact on health care.