The new version of the PostgreSQL open-source database hit the streets Feb. 4 with a host of new features to enhance performance to meet the data management needs of enterprises and large organizations.
While still not as widely used as MySQL, in the open-source database world, PostgreSQL has sought to make a name for itself as a viable alternative to proprietary, enterprise-level databases. Officials connected to the PostgreSQL Global Development Group hope the new release will speed adoption among organizations looking for viable, cost-effective alternatives to proprietary databases.
Version 8.3 includes a number of enhancements around performance. Among them are an asynchronous commit option for faster response time on some transactions and HOT (Heap Only Tuples), which cuts the maintenance overhead associated with frequently updated tables by some 75 percent, according to those involved in the project.
“Transactional database systems create garbage as they process updates invalid index nodes, former row versions, empty data page space – from data which has changed, just as programming languages create garbage in memory as they operate,” said Josh Berkus, a PostgreSQL Core Team member. “In PostgreSQL’s case, [this] garbage collection was dealt with offline through a utility called VACUUM. The advantage of HOT is that it allows a lot of these abandoned row versions to be recycled on the fly, while the data is in memory for a query, greatly reducing the need for VACUUM.
“HOT also allows avoiding index updates for many data changes [that] used to require them,” he said. “This, in turn, significantly improves performance on high-update workloads.”
PostgreSQL is the first open-source database to implement Synchronized Scan, which is aimed at reducing input/output for data mining.
“The idea of Synchronized Scan is that, if you have a single, very large table from which you need to query a lot of data, it can take seconds or minutes to cycle through memory,” Berkus said. “In the past, and in most other [relational database management systems], a second query against the same large table would request a second copy of the table in order to start over at the beginning. Synchronized scan allows the second and succeeding scans to piggyback on the first scan and only cycle back to revisit the beginning of the table after the first scan has finished. This means the table is read from disk less times to satisfy more queries.”
Other new features are aimed at application developers, including ANSI-standard SQL/SSPI authentication support and new data types. Visual C++ compilation of PostgreSQL has been enabled to improve stability and performance on Windows. New logging options have been added and the overhead of the statistics collector has been diminished in order to make it easier to monitor servers, officials said.