Shared Data, Understanding

In-place processing, real-time analysis and data federation are key concepts for data architects.

Information is an organizations oxygen, and data is its memory.

The not-so-easily-reached goals for an enterprise data architecture are managing information in ways that make it available to all those who can help the organization benefit from it (while keeping it private from others), maintaining accuracy in the data, and allowing data points to be seen in the context of their own history and in comparison with related data.

Theres long been a tension between operational databases and warehouses. Centralized data warehouses have historically been too outdated and inaccessible to provide information about whats going on right now. Meanwhile, operational data repositories, storing records of whats going on right now, tend to be joined at the hip with specific applications and controlled by individual business units. Big can mean slow, and immediately relevant can mean isolated.

Advances in data management technology, however, now allow that trade-off to be re-examined.

Raw speed and capacity continue to march forward inexorably—the Transaction Processing Performance Councils TPC-C database benchmark results show each of the top-five single-server benchmarks now processing more than 400,000 transactions per minute.

A new Microsoft Corp. and NEC Solutions America Inc. test result published Feb. 20—433,107 transactions per minute—puts the SQL Server/NEC server combo in second place overall and shows the best-ever throughput for a 32-way server running any database or operating system.

These kinds of throughput gains in database systems and servers make consolidation of operational servers much more feasible than in the past. Growing capacities are fundamentally changing the traditional design principles behind data warehousing—namely, the need to use a separate warehouse database loaded during an offline window from multiple operational sources.

Oracle Corp. for years has led the way here, with its ongoing addition of data warehousing features such as bit-mapped indices to its core transactional database. Oracle even killed its stand-alone OLAP database Express (traditionally used for data warehousing) in favor of a new online analytical processing extension to Oracle9i. IBMs DB2 and Microsoft SQL Server are also effective tools in this space, with their support for precomputed queries (materialized views) and OLAP-influenced SQL functions such as rollup and cube.

Doing more analysis directly on the operational database using in-place processing is the way of the future. Transactional databases continue to gain functionality in this area, and closely coupling operational and warehouse processing prevents the schema and semantic mismatch that can happen when the two systems are maintained separately.