Reducted Data Footprint
1. Reduced data footprint
In recent years, column-oriented databases have been noted by many as the preferred architecture for high-volume analytics. A column-oriented database stores data column by column instead of row by row. There are many advantages to this. Most analytic queries only involve a subset of the columns in a table, so a column-oriented database focuses on retrieving only the data that is required. This speeds queries and reduces disk I/O and computer resources.
Furthermore, these databases enable efficient data compression because each column stores a single data type, as opposed to rows that typically contain several data types. Compression can be optimized for each particular data type, reducing the amount of storage needed for the database. Column orientation also greatly accelerates query processing, which significantly increases the concurrent queries a server can process.
There are a variety of column-oriented solutions on the market. Some duplicate data and require as large a hardware footprint as traditional row-based systems. Others have combined the column basis with other technologies, which eliminates the need for data duplication. This means that users don't need as many servers or as much storage to analyze the same volume of data.
For example, some column-oriented databases can achieve compression results ranging from 10:1 (a 10TB database becomes a 1TB database) to more than 40:1, depending on the data. With this level of compression, a distributed server environment can be reduced by a factor of 20 to 50 times and be brought down to a single box-slashing heat, power consumption and carbon emissions.
Virtual data marts are also coming on the scene, leveraging Enterprise Information Integration (EII) technologies to create specialized views of data sets without the need for physical storage. The downside to this approach is that complex queries can be sluggish, which can be a problem when analytic needs call for close to real-time insight.
Open-source software takes efficient resource utilization a step further as it typically does not require proprietary hardware or specialized appliances.