Vendors Struggle to Unite Data Integration with Metadata

By Lisa Vaas  |  Posted 2005-05-11

Vendors Struggle to Unite Data Integration with Metadata

"Federate," IBM says as it pitches its latest data integration installment: to wit, the folding of ETL (extraction, transformation and loading) technology acquired in IBMs Ascential acquisition into its WebSphere/Information Management family of data integration products.

IBM says, "Leave your data wherever it is"—in disparate databases or what have you. Somewhat magically, IBM technology will know how all that data relates to each other.

Its a nice story, and IBMs certainly not the only vendor telling it. Weve got vendors such as Informatica Corp. and Pervasive Software Inc. adding functionality like profiling to their ETL tools as prices on ETL-only tools deflate. Why the price drop, why the added goodies, why the emphasis on data integration?

Customers are looking for products to unite what eWEEKs Renee Ferguson calls "the two sides of the integration equation"—to wit, application integration to link business processes inside and outside an enterprise, and ETL for managing the quality of data in applications that are to be integrated.

"The technologies have long been seen as complementary, but now they are moving ever closer," Ferguson wrote back in 2003 when she was looking at this trend of companies "seeking simpler ways to unify inward-facing business processes and outward-facing business-to-business IT processes."

The goal is lofty. Some two years later, analysts say no vendor has reached it. Thats in spite of IBMs recent unveiling of the WebSphere Data Integration Suite, which is a repackaging of Ascentials next-generation "Hawk" data integration platform and the latest in a blur of IBM Software acquisitions in its Information Management strategy as it seeks the nirvana of integrating the two sides of data integration.

Click here to read more about IBMs latest unveiling of data integration products.

In essence, what data integration experts are begging for is the metadata story, from IBM and other vendors, since metadata is the way in which those two sides of data integration must be tied together.

"Ive sat through four or five briefings on Information Integrator, which was the precursor to all the acquisitions [IBM has] made," said Charlie Garry, a consultant and former Meta Group analyst. "Every time we have these briefings, Id ask them, Wheres the metadata story? How are you going to manage, in a federated environment, the metadata? I dont see how youre unifying that. They admitted, Yeah, we dont have a good metadata story.

"Thats the crux of the whole thing. If you dont have a good metadata story, you cant integrate. You just cant. Its impossible."

Next Page: Ascential was on the right path, but how far has it gotten?

How Far Has Ascential


Tony Baer, an analyst with OnStrategies, said that Ascential was just starting in the "Weve-got-a-good-metadata-story" direction when IBM bought the company—although how far even Ascential got on the path is questionable.

"Ascential had a lot of products to unify; they also grew by acquisition," he said. "Whats its done over the past 1.5 years or so is put together a metadata-driven strategy to coordinate all that stuff. IBM has not started on that path yet. Ascential was not 100 percent done on that path of putting stuff together when it was bought by IBM."

On top of the difficulty of data integration sits the difficulty of tying together multiple products into an integrated, easy-to-use suite. IBM Distinguished Engineer and Vice President of Strategy for Information Integration Nelson Mattos said that in Hawk, we can expect an integrated look and feel: one interface, across different technologies, whether its profiling or data quality or something else.

Analysts arent so sure. "What they are trying to [solve] is an enormously complex problem," said Charlie Garry, a consultant and former Meta Group analyst. "Im not even talking about the integration of the data; Im talking about the integration of the products."

Garry speaks from experience. He headed the field support team at Platinum Technology International Inc., before it was purchased by Computer Associates International Inc. and back when it tried to introduce ProVision, the integration of a number of point products for managing databases and other systems management tasks.

Click here to read Lisa Vaas initial, positive take on IBMs ability to integrate all its Software Divisions acquisitions.

"They were called frameworks back then," Garry said. "BMC tried to do it with Patrol, and CA tried to do it with UniCenter. The goal was much simpler than what IBM is trying to do. We only tried to wrap a set of common services [event management, common install, product metadata] so that install and maintenance of the tools was easier."

Talk to anyone who bought a "framework" even now, Garry said, and youll grasp how difficult product integration can be. "Few ever actually got more than a few pieces working—sort of. Internally there was a great deal of lip service provided to the task of integration by the individual product development labs and divisions within Platinum. This was because they all had to weigh the benefit of actually participating in the integration effort against the distraction it would be for them to continue to sell enough of the point product to meet their individual sales goals and keep their jobs."

IBM is no different and, in fact, has a much larger problem due to its size and complexity, Garry said. "They even have issues selling many of their long-time homegrown products [such as DB2 UDB and warehousing] because the main channel [consulting] doesnt know the product."

Next Page: Metadata is key to product and data integration.

Metadata Is Key to


At Platinum a decade ago, Garry said, there were very strong, individual point products: performance monitors, job scheduler, systems monitor. The vision was to pull it all together into one integrated suite.

That would mean every product would have to use a set of common services, Garry said. For Platinum, it was a communication layer that would enable every product to post events such as database tables being out of space or jobs failing. The communication layer would send the message to a centralized warehouse of alarms, and then the event management system would decide what to do with it.

What was wrong with that? It was hard. It was very, very hard. "The thinking back then was, a decade ago, that metadata management, to do that you need to have a dictionary," he said. "A data dictionary of all data items throughout the organization, and they would explain things like where this data comes from, how it came into being, how it operates, where it exists, who uses it, all this stuff."

All these are things that IBM would certainly have to do, Garry said. "They use special words like ontology, but the bottom line is its really, really hard to do. Theres a reason those companies got bought out and Platinum wasnt able to do anything. Companies tend to shy away from really, really hard stuff unless they absolutely need to do it."

Analysts said that some of the issues to ask IBM and other vendors about, as they patch together both their data integration stories and their product suites, include these: How, exactly, does the vendor present a unified view of metadata across applications? What about across other pieces of the stack? To what depth does the metadata go? When a new application comes online, how does that new metadata get reconciled with the existing metadata? Is it automatic, or are we talking about an army of people dedicated to metadata?

These questions arent optional, analysts said, since data integration simply cant be done without knowing the answers.

"We have not yet heard a unified metadata or master-data management strategy as part of their integration story yet, which is key," said Mark Smith, an analyst at Ventana Research. "This is not optional for any real IT organization which has to begin the automation of the process to ensure quality and security of data across the enterprise. "Focusing on addressing these issues is critical. Now that Ascential has MetaStage [an enterprise metadata directory that provides analysis, reporting and management capabilities for corporate-wide metadata integration] and focus in this area, [it] needs to be part of a larger IBM strategy which is a put it in the database approach."

Next Page: IBM has work to do, and it probably needs to spend more money.

IBM Has Work to


Philip Howard, an analyst at Bloor Research, in Northamptonshire, England, agreed in an e-mail interview that IBM "certainly has a lot of work to do" on integration, but the acquisition of Ascential still makes sense. Besides, these are issues that are industry-wide and beyond IBM, he said. "There is a real metadata issue, but that is an industry issue rather than anything that is IBM-specific," he said.

"As I understand it, metadata integration is a major focus of the Hawk release," he said. "Historically, the Ascential software has essentially consisted of a group of separate products that connected via metabrokers. What the company is moving towards is a common repository that all products run from. As I havent had a detailed briefing on Hawk yet, I do not know whether this applies to all parts of the Ascential Suite or whether some will have to wait until the next release [Rhapsody].

"As far as WebSphere Information Integrator is concerned, metadata is all about defining virtual views or schemas so there isnt a back-end integration issue, unless you want to integrate the WII and Ascential environments into a single platform, but for the time being I think they should really be seen as separate products—though this raises its own issues—both products have their own transformation engines, for example (actually, Ascential has multiple transformation engines of own—I dont know if Hawk will change this) so it would make sense to have a single platform with a single transformation engine, but this is clearly longer term."

Indeed, further work on completing the technology Ascential acquired from Mercator—DataStage TX—is crucial, according to a research note written by Gartner Inc.s Mark Beyer.

"IBMs commitment to Ascentials road map is critical," he wrote. "Ascential previews of the future release of QualityStage boast a new interface and an improved architecture (driven by a unified, centrally managed, active metadata repository), which was intended to be the future architecture of the entire Ascential product line. Further work on completing the DataStage TX … integration must be completed."

QualityStage is a framework for developing and deploying data investigation, standardization, enrichment, probabilistic matching and survivorship operations, for use in transactional, operational or analytic applications, in batch and real time, to facilitate data validation, cleansing or master data entity consolidation for customers, locations and products.

The Ascential roadmap, unfortunately, will take longer to deliver "than IBM currently anticipates" without a shift in resources, Beyer said. The upshot? When it comes to IBM in particular, dont look for it before 2006, and dont look for it without a blood transfusion, he said. "This could be rectified by applying more R&D funding."

Check out eWEEK.coms for the latest database news, reviews and analysis.

Rocket Fuel