IT terms are frequently used in a variety of ways by an array of vendors, co-workers and marketing types, leaving many of us wondering if we even understand what a particular term actually means.
Of course, this problem is exacerbated when a niche market explodes into mainstream acceptance, forcing users and vendors alike to poach the term as a means of positioning themselves within the new, growing market.
Such is the case today with the term EDW (enterprise data warehouse.) With the convergence of opportunity (the time to address the problem), capability (the infrastructure to deliver acceptable performance at an acceptable cost) and need (the demand for business analytics) that has occurred over the past four years, EDW has emerged.
To some organizations, especially large organizations in very information-centered industries such as retail and telecommunications, this concept seems old hat, but for the vast majority of organizations, enterprise data warehousing has only been a concept. Needless to say, all things data warehouse-related are hot topics these days.
To that end we have seen a huge upswing in the use of the term enterprise data warehouse. Indeed, a search on the phrase in Google returns over 63,000 hits. With a term becoming so ubiquitous, we must be careful not to lose sight of its original meaning.
In an attempt to provide clarity, we offer a list of five attributes most often associated with a true enterprise data warehouse. These attributes apply to overall design philosophy as well as to the underlying infrastructure.
Any analytic infrastructure may embody some of these attributes, but to truly be called an enterprise data warehouse they should exhibit all of the attributes in our list.
1. Single Version of the Truth
The overall design goal of an enterprise data warehouse is to create a definitive version of the organizations business data. This is no easy task, when you consider the number and variety of systems and silos of company data that exist within any business organization.
The use of the word enterprise is an important distinction. In some dictionaries the meaning is given as, “An undertaking, especially one of some scope, complication and risk.” In others, an enterprise is defined as “a purposeful or industrious undertaking (especially one that requires effort or boldness).”
So unless your warehouse environment has as an overriding design goal of rationalizing data entities—think customer—and corresponding data elements into a single definitive view, it is not an enterprise data warehouse.
2. Multiple Subject Areas
To create a single version of the truth for an organization, it logically follows that an enterprise data warehouse must consist of multiple subject areas (such as finance, marketing and sales) representing areas of interest both for individual groups and for individuals who must view data across several subject areas. It is important to note that multiple subject areas are a design goal.
There is no minimum number of subject areas required before an organization can assign the term enterprise data warehouse to its environment, as long as the design goal is to add new subject areas in the future. Indeed, it should be understood that the EDW is built a subject area at a time and not all at once.
To keep momentum going for the enterprise data warehouse, we suggest that organizations try to deliver a new subject area every quarter. It is important that as each subject area is added, any overlapping data entities are rationalized within the overall design. This is in keeping with attribute No. 1 and ensures that there is always one uniform view of an entity such as customer.
3. Normalized Design
While in the past, many designers have used denormalized models (such as star or snowflake schemas) to build single-subject data marts, an enterprise data warehouse is typically designed with a more normalized model. The design goal should be flexibility first and performance second.
As the EDW evolves along with the business, the only constant will be change. Since the EDW must reflect the relationship between business entities, a normalized model is more suited to that end. The normalized model provides flexibility to the physical design of the database that will reduce the amount of maintenance required over time. This has the added benefit of reducing the overall TCO (total cost of ownership) of the EDW.
To those who believe it sacrilege to ignore performance, we agree; however that can be handled by the price/performance improvement of modern infrastructure hardware in conjunction with some advanced features offered in modern database software.
Next Page: Importance to mission and scalability are also key.
Importance to Mission, Scalability
Are Also Key”>
4. Mission-Critical Environment
As an EDW matures to include several important subject areas, it will by default become a mission-critical environment. This means that the underlying infrastructure must have all the capabilities expected of any critical system.
These include high availability features (such as online parameter or database structural changes), business continuance (such as failover and disaster recovery features) and security features. The EDW is not a throw-away system or one that can easily be rebuilt. It is intrinsic to the success of other business systems and decision-making processes.
5. Scalability Across Several Dimensions
Over time, the EDW will certainly become perhaps the most shared system within the enterprise. When people think scalability they typically think amount of data stored. A true EDW is not simply defined by the amount of data; it is also distinguished by the amount of query freedom it offers its users.
This means that, unlike a simple data mart, the EDW and its corresponding infrastructure platform must be able to accommodate any query. Not just the questions we know will be asked, but even the questions that users have not thought of yet.
The EDW must be scalable to support the typical handful (10-100) of power query users, as well as perhaps thousands of concurrent tactical users, as the EDW evolves over time. This means the infrastructure must support strong features such as workload distribution and also be capable of massive parallel processing to deliver predictable query performance against increasingly larger sets of data.
The preceding attributes should be used to clarify exactly what it is that your organization is building or what capabilities a particular vendors solution may offer. There are initiatives that sound like EDWs but are in fact something different.
Take the consolidation of data marts onto a single infrastructure. Is the end result an EDW? How about that ERP (enterprise resource planning) vendors business intelligence add-on? Is that an EDW? To answer these questions, you have to compare them against our list of attributes.
That ERP warehouse solution may be multiple terabytes in size with multiple subject areas, and it may even be considered critical to the business.
However, is the design flexible enough to adapt to changing business realities, or does the business have to conform to the ERP vendors view of the world? Is the underlying design truly rationalized across all subject areas? Does it support an infrastructure that is scalable across all the dimensions (such as size, query freedom and concurrent users) that we mentioned previously?
The emergence of data warehouse appliance vendors has also confused the issue. These infrastructure offerings clearly deliver analytic capabilities against very large amounts of data. They can support normalized models and multiple subject areas. Can you build a true EDW on a data warehouse appliance? Well, currently they lack features—such as workload management and business continuance features—that are required to meet our EDW attributes four and five.
To be sure, an EDW is not the be-all and end-all solution for every business analytic need. There remain legitimate needs for large-scale tactical analytic processing (such as nightly billing) or for single-subject data marts that do not require the complex extraction and transformation of data required by an EDW.
Still, the need for organizations to create an environment that contains a single corporation-wide view of the data is becoming critical for business decision-makers. Organizations must clearly evaluate what their needs and objectives are before making the decision to proceed with a solution that may look like an enterprise data warehouse but is in fact something completely different.
Charles Garry is an independent industry analyst based in Simsbury, Conn. He is a former vice president with META Groups Technology Research Services.