Diverse Sciences Propel Bioinformatics

The obvious methods for understanding how molecular machinery works have already been tried, so new approaches must meld biology, computer science, chemistry, engineering and other disciplines.

At conferences in computational biology, speakers generally start with questions: "How many people in the room are biologists? Computer scientists? Other?" It can be hard to predict what kinds of experts will show up in the audience. This years Computational Systems Bioinformatics Conference, the third of its kind, was no exception.

The CSB 2002 Web site described the conferences goal as bringing together "biology and computer science" experts. This year, the conference organizers hope to "promote a systems biology approach that links biology, computer science, mathematics, chemistry, physics, medicine and engineering." Thats five new disciplines in two years. Even so, weve left out statistics.

And contributing disciplines seem poised to become more nuanced, not less. The obvious methods for understanding precisely how molecular machinery yields breathing, thinking beings have already been tried, UC Berkeley Professor Eugene Myers said in a keynote speech. Cross-disciplinary conferences, and scientists, are needed for future discovery. Myers disputed the common notion that the completion of the human genome project has supplied the "parts list" needed to understand the mechanisms underlying human disease.

The former vice president of Celera Genomics, who helped lead the companys human genome assembly effort, said Celeras and the public consortiums "completion" of the human genome project was a huge accomplishment. But he insisted that the parts list is still far from complete, reminding attendees that we are in fact still a long way from identifying the location of every gene, let alone identifying their functions.

One is struck both by how far the field has come in a relatively short period of time, and also by how far it has yet to go. In the past 10 years, the numbers of sequences stored in public databases such as GenBank, SwissProt and even the Protein Data Bank all have increased exponentially. But as scientists understand all too well, data does not equal knowledge. Also, as experiments get faster, there is more room for error.

Perhaps the most sobering presentation came from Patrik Dhaeseleer, a member of George Churchs lab at Harvard Medical School, who compared three high-throughput data sets of protein-protein interactions in yeast with disturbingly low overlap between sets and estimated false-positive rates as high as 90 percent. Such findings are discouraging: What scientist wants to risk pursuing an intriguing hypothesis only to find, perhaps years later, that the initial "evidence" was no more than a statistical fluke or an artifact of an experimental protocol?

Help may come from unexpected quarters. The conference agenda itself highlighted how interdisciplinary this field is. Talks included a keynote speech by Benoit Mandelbrot, the founder of the division of mathematics known as fractal geometry. Other presentations included methods from high-throughput microscopy, text processing, data mining, artificial intelligence and more.

Fusions of fields are not just expected but required. Stephen Wong of Harvard University explained how to use robotic automation and digital microscopy to screen thousands of cells simultaneously for, among other tasks, high-throughput drug screening.

/zimages/5/28571.gifClick here to read about a committees efforts to develop bioinformatics standards.

An example from Eran Segal of Stanford University showed what can happen when combining computer science, expression profiling, statistics and protein signaling. The projects start in a field of mathematics known as graph theory. Genes and proteins are modeled as nodes in a graph; edges between the nodes represent interactions in a biological system. Such systems may represent how cells in healthy and diabetic patients process glucose in the presence of insulin, or how cancerous and healthy cells respond to signals to stop dividing.

Researchers plug in results from DNA microarray experiments as a data set for a form of probabilistic reasoning known as Bayesian networks. The experimental data let the researchers computationally refine the hypothetical interactions in an effort to figure out how genes and proteins regulate one another. And the results can be supported with additional microarray experiments.

Science famously builds on the work that precedes it, and some researchers make sure that others work has as much leverage as possible. They focus on the body of literature that exists in free text form in public databases, such as MEDLINE. They use natural language processing and machine learning to ease efficient use of other scientists research. The tools may sort out ambiguous gene and protein names in article abstracts or enable more accurate literature searches.

As experts from these diverse and previously unrelated fields continue to work together on new approaches and technologies, they will make discoveries never before possible. Despite all of the progress that has been made, or perhaps because of it, the potential to use ones own expertise to shape a scientific discipline has never been greater.

Jessica D. Tenenbaum is a graduate student in biomedical informatics at Stanford University.

/zimages/5/28571.gifCheck out eWEEK.coms Health Care Center at http://healthcare.eweek.com for the latest news, views and analysis of technologys impact on health care.