What Makes an Enterprise Data Warehouse?

Opinion: It's time to define terms. If an EDW is what you need, here's what it should offer.

IT terms are frequently used in a variety of ways by an array of vendors, co-workers and marketing types, leaving many of us wondering if we even understand what a particular term actually means.

Of course, this problem is exacerbated when a niche market explodes into mainstream acceptance, forcing users and vendors alike to poach the term as a means of positioning themselves within the new, growing market.

Such is the case today with the term EDW (enterprise data warehouse.) With the convergence of opportunity (the time to address the problem), capability (the infrastructure to deliver acceptable performance at an acceptable cost) and need (the demand for business analytics) that has occurred over the past four years, EDW has emerged.

To some organizations, especially large organizations in very information-centered industries such as retail and telecommunications, this concept seems old hat, but for the vast majority of organizations, enterprise data warehousing has only been a concept. Needless to say, all things data warehouse-related are hot topics these days.

/zimages/5/28571.gifRead details here about DataMirrors data warehouse management application.

To that end we have seen a huge upswing in the use of the term enterprise data warehouse. Indeed, a search on the phrase in Google returns over 63,000 hits. With a term becoming so ubiquitous, we must be careful not to lose sight of its original meaning.

In an attempt to provide clarity, we offer a list of five attributes most often associated with a true enterprise data warehouse. These attributes apply to overall design philosophy as well as to the underlying infrastructure.

Any analytic infrastructure may embody some of these attributes, but to truly be called an enterprise data warehouse they should exhibit all of the attributes in our list.

1. Single Version of the Truth

The overall design goal of an enterprise data warehouse is to create a definitive version of the organizations business data. This is no easy task, when you consider the number and variety of systems and silos of company data that exist within any business organization.

The use of the word enterprise is an important distinction. In some dictionaries the meaning is given as, "An undertaking, especially one of some scope, complication and risk." In others, an enterprise is defined as "a purposeful or industrious undertaking (especially one that requires effort or boldness)."

So unless your warehouse environment has as an overriding design goal of rationalizing data entities—think customer—and corresponding data elements into a single definitive view, it is not an enterprise data warehouse.

2. Multiple Subject Areas

To create a single version of the truth for an organization, it logically follows that an enterprise data warehouse must consist of multiple subject areas (such as finance, marketing and sales) representing areas of interest both for individual groups and for individuals who must view data across several subject areas. It is important to note that multiple subject areas are a design goal.

There is no minimum number of subject areas required before an organization can assign the term enterprise data warehouse to its environment, as long as the design goal is to add new subject areas in the future. Indeed, it should be understood that the EDW is built a subject area at a time and not all at once.

To keep momentum going for the enterprise data warehouse, we suggest that organizations try to deliver a new subject area every quarter. It is important that as each subject area is added, any overlapping data entities are rationalized within the overall design. This is in keeping with attribute No. 1 and ensures that there is always one uniform view of an entity such as customer.

3. Normalized Design

While in the past, many designers have used denormalized models (such as star or snowflake schemas) to build single-subject data marts, an enterprise data warehouse is typically designed with a more normalized model. The design goal should be flexibility first and performance second.

As the EDW evolves along with the business, the only constant will be change. Since the EDW must reflect the relationship between business entities, a normalized model is more suited to that end. The normalized model provides flexibility to the physical design of the database that will reduce the amount of maintenance required over time. This has the added benefit of reducing the overall TCO (total cost of ownership) of the EDW.

To those who believe it sacrilege to ignore performance, we agree; however that can be handled by the price/performance improvement of modern infrastructure hardware in conjunction with some advanced features offered in modern database software.

Next Page: Importance to mission and scalability are also key.