Shared Data, Understanding

By Timothy Dyck  |  Posted 2003-03-10 Print this article Print

In-place processing, real-time analysis and data federation are key concepts for data architects.

Information is an organizations oxygen, and data is its memory. The not-so-easily-reached goals for an enterprise data architecture are managing information in ways that make it available to all those who can help the organization benefit from it (while keeping it private from others), maintaining accuracy in the data, and allowing data points to be seen in the context of their own history and in comparison with related data.

Theres long been a tension between operational databases and warehouses. Centralized data warehouses have historically been too outdated and inaccessible to provide information about whats going on right now. Meanwhile, operational data repositories, storing records of whats going on right now, tend to be joined at the hip with specific applications and controlled by individual business units. Big can mean slow, and immediately relevant can mean isolated.

Advances in data management technology, however, now allow that trade-off to be re-examined.

Raw speed and capacity continue to march forward inexorably—the Transaction Processing Performance Councils TPC-C database benchmark results show each of the top-five single-server benchmarks now processing more than 400,000 transactions per minute.

A new Microsoft Corp. and NEC Solutions America Inc. test result published Feb. 20—433,107 transactions per minute—puts the SQL Server/NEC server combo in second place overall and shows the best-ever throughput for a 32-way server running any database or operating system.

These kinds of throughput gains in database systems and servers make consolidation of operational servers much more feasible than in the past. Growing capacities are fundamentally changing the traditional design principles behind data warehousing—namely, the need to use a separate warehouse database loaded during an offline window from multiple operational sources.

Oracle Corp. for years has led the way here, with its ongoing addition of data warehousing features such as bit-mapped indices to its core transactional database. Oracle even killed its stand-alone OLAP database Express (traditionally used for data warehousing) in favor of a new online analytical processing extension to Oracle9i. IBMs DB2 and Microsoft SQL Server are also effective tools in this space, with their support for precomputed queries (materialized views) and OLAP-influenced SQL functions such as rollup and cube.

Doing more analysis directly on the operational database using in-place processing is the way of the future. Transactional databases continue to gain functionality in this area, and closely coupling operational and warehouse processing prevents the schema and semantic mismatch that can happen when the two systems are maintained separately.

Timothy Dyck is a Senior Analyst with eWEEK Labs. He has been testing and reviewing application server, database and middleware products and technologies for eWEEK since 1996. Prior to joining eWEEK, he worked at the LAN and WAN network operations center for a large telecommunications firm, in operating systems and development tools technical marketing for a large software company and in the IT department at a government agency. He has an honors bachelors degree of mathematics in computer science from the University of Waterloo in Waterloo, Ontario, Canada, and a masters of arts degree in journalism from the University of Western Ontario in London, Ontario, Canada.

Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters

Rocket Fuel