How to Achieve Greener Data Storage and Analysis

Today, data volumes are doubling every 18 months, and enterprises want to keep more data online and provide access to more users. The impact is huge increases in the amount of hardware infrastructure needed, resulting in corresponding increases in power, cooling and data center space needs. Here, Knowledge Center contributors Rick Abbott and Bob Zurek explain how enterprises can use greener approaches to data storage and analysis via open-source solutions that reduce operational effort and cost.


In a world where business is transacted 24/7 across every possible channel available, companies need to collect, store, track and analyze enormous volumes of data-everything from clickstream data and event logs to mobile call records and more. But this all comes with a cost to both businesses and the environment. Data warehouses and the sprawling data centers that house them use up a huge amount of power, both to run legions of servers and to cool them. Just how much? A whopping 61 billion kilowatt-hours of electricity, at an estimated cost of $4.5B annually.

The IT industry has begun to address energy consumption in the data center through a variety of approaches including the use of more efficient cooling systems, virtualization, blade servers and storage area networks (SANs). But a fundamental challenge remains. As data volumes explode, traditional, appliance-centric data warehousing approaches can only continue to throw more hardware at the problem. This can quickly negate any green gains seen through better cooling or more tightly packed servers.

To minimize their hardware footprint, organizations also need to shrink their "data footprint" by addressing how much server space and resources their information analysis requires in the first place. A combination of new database technologies expressly designed for analysis of massive quantities of data and affordable, resource-efficient, open-source software can help organizations save money and become greener.

Organizations can do so in the following three key areas: reduced data footprint, reduced deployment resources, and reduced ongoing management and maintenance. Let's take a look at each more closely: