Yukons Data Transformation Services a Hit at SQL Confab

 
 
By Lisa Vaas  |  Posted 2003-11-12
 
 
 
Aiming to unsnarl database administrators jobs, Microsoft Corp. revealed one new database tool and major enhancements to another at its PASS (Professional Association for SQL Server) Community Summit in Seattle on Wednesday.

The tool thats getting a major overhaul is Microsofts ETL (extract, transform and load) technology—known as DTS (Data Transformation Services). The revamped DTS will debut in the upcoming upgrade of SQL Server, code-named Yukon, due in the second half of 2004.

Microsoft, of Redmond, Wash., also announced availability of the BPA (Best Practices Analyzer) for SQL Server 2000, a tool designed to help database administrators avoid common errors when managing SQL Server installations.

DTS will undergo a complete rearchitecture designed to make it enterprise-ready, with better scalability, manageability and reliability. The tool will pick up a richer graphical environment that features graphical debugging. Thats aimed at giving DBAs an easier way to pick up on and squash bugs as they happen in a graphical flow. The new environment will be code-free, with features such as drag-and-drop split transformations, built-in joins, and built-in Web service transformations.

Another significant improvement appears in the process of moving large amounts of data through DTS. In the current iteration of the ETL tool, when an error occurs—for example, on the 500 millionth row of a billion-row data set thats being moved from a data mart to a data warehouse—the process could grind to a halt if the error isnt handled correctly. Fixing and restarting the package means starting over from Row 1.

In the forthcoming DTS, however, a checkpoint restart feature will let the error be fixed and the movement of data will commence back to where it left off. In the example above, that would be Row 500,000,001.

In addition, a new fuzzy lookup capability will be geared to cleaning up dirty data. For example, if the term "black" is incorrectly entered in various ways in tables, such as "Blk," "B," or "Blck," built-in data-cleansing transformation rules will allow fuzzy lookups against a master table. Fuzzy logic delivers a confidence level when it picks up on a suspected piece of dirty data, and users can specify a level of confidence at which changes are allowed to happen automatically.

Brian Knight, formerly on the Board of Directors for PASS and currently a SQL Server database manager at Fidelity Information Systems, in Jacksonville, Fla., is a Yukon beta tester who considers DTS one of the hottest things coming in the new database management system. To the point, the new DTS will finally take care of the "big threading problem" of the current DTS, Knight said, wherein parallelism has always been clunky.

With the current version, Knight finds that DTS hits a threshold of about 30 million records, after which he has to quit and cook up his own, custom ETL. The Yukon DTS can handle many more records and is far more asynchronous. "In DTS, if you load a table asynchronously, its a problem," Knight said. "It wont load well. With the new DTS, it can actually load tables in a parallel, asynchronous manner."

That will save Knights organization from customizing its own solution in C++ to get extra speed. "It just saves weeks off my time to market, not having to develop that solution," he said.

As for the new BPA tool, which is available for SQL Server 2000, that will handle the SQL Server "Doh!" moments—simple problems that crop up frequently in calls to Microsofts help desk, according to Tom Rizzo, director of product management for SQL Server.

One such frequently-occurring glitch is when log files are placed on compressed drives, which makes the files run far slower the usual. The compressing and decompressing of data requires extra CPU cycles. With BPA, managers can make sure log files are put on uncompressed drives automatically through a Best Practice designation.

The BPA was modeled on, and works with, Baseline Security Analyzer. The tool scans SQL Server instances to determine which processes are compliant, partially compliant or noncompliant with Best Practices. Its available now and can be downloaded here..

Discuss This in the eWEEK Forum

In a semiannual survey of the worlds largest databases, Microsoft for the first time found a place in the top-ten databases.

Rocket Fuel