Metadata Management a Key Part of ETL

Enterprises must consider metadata management capabilities when selecting an ETL tool.

Performance and connectivity to data sources are key issues when it comes to selecting extract, transform and load tools. However, enterprises should keep an eye towards metadata management before they buy.

Metadata has more than a tiny role to play in data migration projects. Metadata is, simply put, data about data, and provides context for the information stored in an organization's databases by answering questions about data formatting as well as how, when and by whom data was collected. ETL tools with metadata capabilities can improve the speed and quality of data integration and support good governance, a number of analysts said.

"Every ETL tool should maintain clean separation between runtime, code, metadata and models," said Burton Group analyst Joe Bugajski. "The metadata must include trace relationships for tracking the lineage of any data item from any source to any sink and whatever happened to the data between these two."

The metadata environment should be searchable using standard query tools and, preferably, the metadata layer should connect to a MDM (master data management) metadata control system, Bugajski said.

Many ETL tools have become the traffic cops for enterprise metadata, Forrester Research analyst Rob Karel said. He said the tools offer ability to provide data lineage as well as impact analysis, root cause analysis and the ability to audit data movement and manipulation from source capture points all the way through to data consumption in their BI reports.

IBM, one of the major players in the market for ETL tools, released Metadata Workbench last year to improve visibility and management during the data integration process. According to research by Forrester, the adoption of data integration tools is increasing, as businesses discover the benefits of using tools instead of hand coding.

"[ETL tools] provide a metadata-rich environment which allows development teams to understand the impact of changes before they are made to reduce the time required to maintain data integration logic," said Michael Curry, director of product marketing for information platform and solutions for IBM.

Overwhelmingly, analysts and others said the main challenges to the typical ETL project have little to do with the tools themselves. An ETL tool cannot help when it comes to getting organization-wide agreement on taxonomies and definitions, said Joy Mundy, an analyst with The Kimball Group.

Still, ETL tools aid with source control, consistency and system documentation - three less headaches for those involved in the project, Mundy said.

"An ETL application is incredibly complex. We need a high-level view of the activities with easy ability to drill into the details," she said. "That way, it's not all in the head of a developer who might leave the company some day, not to mention, not all in the head of a developer who will forget what he's done in six months."