Open-Source BI Stack Could Save Data Centers Millions
The sample application, called Bizgres Clickstream, is built on top of Bizgres, a community-supported project launched in April that aims to make the open-source PostgreSQL database the industrys platform of choice when it comes to business intelligence.
The application, Bizgres Clickstream, combines open-source ETL (extraction, transformation and loading) technologies from Kinetic Networks Inc.; JasperReports, an open-source reporting engine from JasperSoft Corp.; and Bizgres, a data-warehousing database based on PostgreSQL.
Luke Lonergan, chief technology officer and co-founder at Greenplum, which sponsors The Bizgres Project, said that the impetus behind the stack was a large number of open-source developers on staff at data centers, all of them looking for open-source alternatives for data analytics and none of them finding anything that worked.
"What they were doing is turning to Oracle [Corp.] or other commercial options for data warehousing and analytical tools," he said. "Theres a big gap in the open-source community that we wanted to fill with something thats complete, [so that] somebody could get started quickly and easily and get large-scale reporting applications done that are common now" with the use of pricey proprietary applications, he said.
Does it stand a chance of unseating Oracle or DB2 in the data center? Stacey Quandt, an analyst for Quandt Analytics, said that well see this only to a limited degree.
"PostgreSQL and MySQL are capable of eroding the market share of Oracle or DB2 at the low end of the market," she said. "However, enterprise customers will continue to pay for the value-add of DB2 and Oracle for years to come."
Still, its another indication that Linux and open source are becoming ever more deeply entrenched in the data center, she said. "Years ago, people scoffed at running Linux in the data center, and today it is more the norm than an anomaly. The economics of the data center continue to evolve with the continued focus on faster, better, cheaper solutions."
The applications reporting and ETL components are incorporated into the newly available Bizgres Version 0.7, available for download as a source distribution here or as a binary distribution here. The Clickstream application is available for download here.
A key feature in the Bizgres 0.7 release is table partitioning support, which is critical for large data-warehousing applications. Greenplum plans to release a massively parallel version of Bizgres later this year. It will support multi-terabyte data volumes.
Bizgres Clickstream showcases the capabilities of the integrated components in this BI stack, including automated collection and processing of Web server logs, population of multidimensional warehouse schema, and out-of-the-box reports on Web site activity.
Quandt said that the release of the BI stack would help to address a long pent-up demand for simpler and less-expensive analytical tools.
"Many enterprise customers have a requirement for a business intelligence solution. However, most proprietary reporting solutions are expensive and complex," she said. "Early adopters of Linux for Internet infrastructure have expanded the deployment of open-source solutions to the application server and database tiers and business intelligence (large-scale reporting is merely an extension of this)."
As far as which open-source database should sit at the base of such a stack, Quandt said that PostgreSQL has been around longer than MySQL and in some ways is more mature than MySQL. "This makes it a good database to build an open-source stack on," she said. "In contrast, MySQL partnered with Business Objects in order to expand into the business intelligence segment."
Regarding JasperReports and Kinetics ETL tools, Quandt said that theyre a good start for users who dont require the functionality of tools from, for example, SAS or Cognos.
JasperReports has been around about four years. Hosted on Sourceforge.net, its been downloaded in the range of 350,000 times and, as of the publishing of this article, was the 10th most active project. Its deployed in about 10,000 sites across the world.
Check out eWEEK.coms for the latest database news, reviews and analysis.