MetaMatrix Inc. and IBM are rolling out enhanced virtual database technologies to untie the knots created when IT tries to connect business users to disparate data sources.
MetaMatrix this week will unveil an update to its namesake information integration and data management system that uses virtual database technology to enable two-way transactions across a heterogeneous batch of databases and other information sources.
MetaMatrix 3.1 abstracts data sources into a virtual database of integrated information while acting as the data delivery system for applications that include Web services. It lets users create XML documents that integrate data sources such as relational databases, flat files, Web sites, e-mail and application data, said company officials, in New York. Version 3.1 features an extensible back-end framework for connecting to nearly any type of information source; access for querying and interfacing is provided via Simple Object Access Protocol, Java Database Connectivity, ODBC and Java (see screen).
Separately, IBM last week announced it is working with a Canadian bioresearch center to create an information system that uses a virtual database to integrate data from a variety of databases, flat-file formats and file types. The iQ Engine, being developed with iCapture Center, of Vancouver, British Columbia, uses IBMs DB2 database and DiscoveryLink integration technology.
The goal is to create a system that will assist researchers in correlating genetic susceptibility of patients with cardiovascular and respiratory diseases to environment influences such as culture, socioeconomic status, educational background, inhaled cigarette smoke, pollutants, viruses, allergens, diet and obesity.
As part of the project, IBM, of Armonk, N.Y., and the iCapture Center are deploying IBMs DB2 database and DiscoveryLink integration software, which includes data wrappers for file formats that are common in biomedical use. One wrapper, for instance, converts SQL statements to handle the BLAST (Basic Local Alignment Search Tool) file format, created by scientists to store protein and DNA data.
With DiscoveryLink, researchers will be able to retrieve data from public databases such as GenBank, a genetic sequence database maintained by the National Institutes of Health. The retrieved data can be compared with relational data in DB2 and other databases and can be stored in a relational database.
Other data wrappers for file formats that are common in biomedical use are also in the works, IBM officials said. One emerging format IBM is working with is MAGE-ML (Microarray Gene Expression Markup Language), which is used for storing microarray and proteomic data that make use of XML.
Raimond Winslow, professor of biomedical engineering and computer science and director of the Center for Cardiovascular Bioinformatics and Modeling at Johns Hopkins University, in Baltimore, is implementing DiscoveryLink to store large microarrays of nucleotides. Previously, Winslow had to write individual SQL queries into those data sets. DiscoveryLink takes away the burden of parsing SQL queries so that appropriate query components are sent to their respective data source and then reintegrated, with their query results.
There are many biomedical file formats left to integrate into relational database technology, Winslow said. If scientists can get their hands around the oceans of data such file formats encompass, the results could be profound. "You can mine the data sets to discover new patterns that are markers for disease," Winslow said. "Information tools [such as DiscoveryLink provide] ways in which you can mine these huge data sets."
This story has been changed since its original posting to correct the status of IBMs BLAST wrapper, and the use of the technology with GenBank.