STANFORD, Calif.—Developing standards for describing biological data may not be glamorous, but the lack of standards undermines researchers productivity, said one researcher at the first meeting of the Bioinformatics Standards Committee here on Thursday.
"I want to know what the fields mean and I want to know that everybody else has the same concept of what the fields mean," said Eugene Myers, a computer science professor at the University of California, Berkeley. "My blood boils when I have to write yet another parser for yet another data set," he said, describing time wasted and projects not attempted because of incompatible and inconsistent data sets. These and other issues could be ameliorated by broad standards, said the former industry leader, who is famous for his work to speed genome sequencing at Celera Genomics and for developing tools widely used by molecular biologists, including Blast and Anrep.
Excitement at the meeting was palpable, even though participants freely described the standards-making process as "painful" and "arduous." Many clearly felt that a standards project could foster efficiency and perhaps even reverse the rapid fragmentation of life sciences. The meeting—a panel discussion—came on the last day of the IEEE-sponsored Computational Systems Biology Conference at Stanford University.
Part of the purpose of the meeting was to gauge whether the relatively new field was ready for standards. Vicky Markstein, the bioinformatics technical chair of the IEEE Computer Society, explained that fields must reach a certain level of maturity for practitioners to be willing to give up individual control and work together.
"Theres a big trade-off between having flexibility to design new representations and databases and allowing people to share information with standards" agreed Sylvia Spengler, who is program director of Science and Engineering Informatics at the National Science Foundation. Still, she and other participants at the meeting seemed ready to try.
The IEEE has agreed to shepherd the development with the goal of creating IEEE-approved standards. Cherry Tom, project initiation manager for the IEEE Standards Association, explained that the standards would be developed by volunteers and supported by staff. The development process must be open: Everybody affected must be able to learn and comment on developments, and the process must be built on consensus.
Standards are already being developed, in an ad hoc fashion, within various pockets of the life sciences community. There is already considerable overlap in standards being codified by societies such as W3C, IUPAC, and I3C; part of the committees job will be to figure out what all the other groups are doing.
"Theres a lot of social skill in actually getting standards to work," said Spengler. "You need people capable of making sure that all the views come together and that all the views are expressed."
The issues are sometimes as fundamental as developing a common language, continued Spengler. For example, the same protein or gene could have different names in different communities, such as those that think about pathways, gene products, gene expression or gene sequences. This hampers researchers attempts to work together or use each others data. If a researcher is looking for information on a particular gene, a literature search may not pull up all of the relevant data, or could pull up misleading or irrelevant data.
Yet another challenge is a lack of information about the experimental conditions used to generate data that are shared with the scientific community. Though experimental data are usually recorded somewhere, they are often not available electronically to other researchers.
Part of the problem, explained John Westbrook, co-director of the Protein Data Bank, a massive repository of three-dimensional protein structures, is that instruments that store information about experimental conditions do so in ad hoc internal formats. Because these instruments are "built around a proprietary structure," he said, "importing and exporting information out of the robot in the way that you might want could be difficult." Bioinformatics standards would help, he said, if the instruments complied with them.
V. K. Holtzendorf, life science program manager at Hewlett-Packard Co., said that HP was waiting for standards to be adopted. "We dont invent standards in this business," she argued, "What we do is comply to the standards that people want." HP, along with the U.S. Department of Energy, provided funding for the event.
Ultimately, noted several advocates, the adoption of standards will be fueled by a return on investment. Federal agencies are pushing standards because they could mean the research they fund can be widely used by other agencies, Spengler and Westbrook suggested. Guidelines could help researchers make the right kind of technical choices so that they dont waste resources reinventing the wheel.
Howard Asher, the head of the Life Sciences-Information Technology Global Institute, went so far as to predict that bioinformatics standards could drive down the cost of drug development by making human drug trials merely confirmatory. "We can do computational predictive models, and eliminate the need for [some kinds of] clinical trials."
But that is still far in the future. For now, IEEEs Tom told the participants, "Your charge is to look through the field of bioinformatics and select one single standards project." Creating the first IEEE-approved standard could take as long as five years.