Vertica Systems has enhanced its column database with a new data storing and processing architecture designed to improve performance.
The company has dubbed the new architecture FlexStore. With it, customers can organize different parts of the database in different ways to achieve maximum performance and compression, Dave Menninger, vice president of marketing and product management at Vertica, told eWEEK.
“The whole objective of this is to reduce the amount of I/O that’s necessary to satisfy a query,” he said. “Reducing I/O [input/output] gives you better performance.”
Vertica 3.5 automatically applies a variety of physical design, database storage and query execution techniques that keep the database optimized for the analytic workload it’s supporting at the time. For example, users can group multiple columns into a single disk file to minimize file input/output for workloads that read a large percentage of the columns in a table, do single row look-ups, query against many small columns or that frequently update data in those columns.
The new database also allows users to automate the creation of tiered storage to improve information lifecycle management.
“The reason you would care about this is if you have different parts of your architecture that have different I/O characteristics,” Menninger said. “Once you recognize that you can have different performance characteristics in different parts of your system, you want to be able to take advantage of that and put the data that is accessed most frequently on the portions of the disk that perform the best, or on the disks that perform the best.”
FlexStore is one of two main enhancements the company is touting; the second is support for Apache Hadoop, the open-source version of MapReduce. The move follows similar moves by companies such as Greenplum and Aster Data Systems. Vertica officials argue, however, that they have taken a slightly different approach to MapReduce than other vendors.
“Vertica’s introduction of support for MapReduce differs from the approach of other vendors in that it supports it as a parallel capability rather than as something integrated with SQL,” noted Philip Howard, an analyst with Bloor Research, in a statement. “This makes sense because most people using MapReduce are not SQL programmers, and vice versa.”
A MapReduce job is typically a big one, Menninger said, and since the Vertica cluster is usually fully loaded it does not make sense to throw another big job on the cluster.
“We believe, and our customers have told us, that they deliberately want those two environments, the Vertica and MapReduce environment, to be separate but equal,” he said.
The new version of the database will be generally available in October and can run on Linux servers, VMware vSphere or VMware Server-supported hardware or on-demand in enterprise and public clouds such as the Amazon Elastic Compute Cloud (Amazon EC2).