Information is an organizations oxygen, and data is its memory.
The not-so-easily-reached goals for an enterprise data architecture are managing information in ways that make it available to all those who can help the organization benefit from it (while keeping it private from others), maintaining accuracy in the data, and allowing data points to be seen in the context of their own history and in comparison with related data.
Theres long been a tension between operational databases and warehouses. Centralized data warehouses have historically been too outdated and inaccessible to provide information about whats going on right now. Meanwhile, operational data repositories, storing records of whats going on right now, tend to be joined at the hip with specific applications and controlled by individual business units. Big can mean slow, and immediately relevant can mean isolated.
Advances in data management technology, however, now allow that trade-off to be re-examined.
Raw speed and capacity continue to march forward inexorably—the Transaction Processing Performance Councils TPC-C database benchmark results show each of the top-five single-server benchmarks now processing more than 400,000 transactions per minute.
A new Microsoft Corp. and NEC Solutions America Inc. test result published Feb. 20—433,107 transactions per minute—puts the SQL Server/NEC server combo in second place overall and shows the best-ever throughput for a 32-way server running any database or operating system.
These kinds of throughput gains in database systems and servers make consolidation of operational servers much more feasible than in the past. Growing capacities are fundamentally changing the traditional design principles behind data warehousing—namely, the need to use a separate warehouse database loaded during an offline window from multiple operational sources.
Oracle Corp. for years has led the way here, with its ongoing addition of data warehousing features such as bit-mapped indices to its core transactional database. Oracle even killed its stand-alone OLAP database Express (traditionally used for data warehousing) in favor of a new online analytical processing extension to Oracle9i. IBMs DB2 and Microsoft SQL Server are also effective tools in this space, with their support for precomputed queries (materialized views) and OLAP-influenced SQL functions such as rollup and cube.
Doing more analysis directly on the operational database using in-place processing is the way of the future. Transactional databases continue to gain functionality in this area, and closely coupling operational and warehouse processing prevents the schema and semantic mismatch that can happen when the two systems are maintained separately.
: Tech Outlook 2003″>
Along with greater volumes of data, organizations are working to respond more quickly to information. This dynamic is partly driven by portal deployments because portals make data staleness painfully obvious to all.
Processing and analyzing data in near real time is viewed as both the Holy Grail of enterprise data management and as a pipe dream, given its difficulties.
Real-time data analysis requires a more fluid approach to data storage than has been traditionally taken. Data in transit, data stored on a clients system or data stored in an internal federated database should all be viewed as part of a broader understanding of “database.”
Dynamic data gateways are key strategic tools in this process, as are more robust data collection tools at the start of the customer interaction chain. With todays demand for real-time information, data cleansing and aggregation can no longer be left until weeks after data collection occurs.
Web services are proving to be an effective way to link disparate data systems to loosely coupled federated databases. In the Web services model, data is available in real time, mapped to a common, interoperable data format (XML) and supported through the entire client/midtier/database application stack.
XML technologies will profoundly change the data landscape during the next 18 months. Every major transactional database is undergoing major internal surgery to natively support XML Schema data types and XML Query as a query language. Microsofts forthcoming Office 2003, for example, provides data creators with more powerful ways to generate structured, highly reusable data than ever before (see review, Office Embraces XML).
While databases will continue to store data in a variety of ways, XML-influenced trends of data fluidity, in-flight transformation and self-describing data (the co-location of data with its structural description) are profound changes to how organizations manipulate data.
West Coast Technical Director Timothy Dyck is at [email protected]
Building a Strong Data
Building a Strong Data Architecture
Keep data available
- Portal, Web-based and mobile application deployments keep employees in close contact with data.
- Database replication techniques allow for broad data distribution while enforcing data quality rules.
- Security should be implemented as close to the data as possible using row- and column-level permissions.
Keep data accurate
- Maintain a centralized, consistent data dictionary.
- Try to initially create departmental databases by building subsets of larger operational databases.
- Enforce business rules at all points of data entry using shared business logic.
Keep data in context
- When possible, fold warehouse and analysis tasks into operational databases to allow all data to be manipulated in one place.
- OLAP, data mining, data visualization and statistical analysis are all highly effective (although underused) techniques for understanding data.
- Web services allow data from enterprise resource planning and other vertical applications to be integrated into line-of-business applications far more easily than ever before.