iLuminate 4.0 Overcomes Data Warehouse Hurdles - Page 2

I tested iLuminate on a Sun Microsystems Sun Fire x4200 server with a pair of dual-core 2.39GHz AMD processors and 8GB of RAM, running the 64-bit version of Windows Server 2008 Enterprise. (iLuminate has a Linux version of its engine in the works for later in 2009.)

For my test data, I used campaign finance data from the Center for Responsive Politics' site. The data, which was stored in a series of CSV (comma-separated values)-formatted text files, detailed campaign contributions, candidates, political action committees and expenditures spanning the 1990 to 2008 election cycles.

The data set, which added up to about 30 million records, encompassed more than 545 million values, about 49 million of which were unique. Because of iLuminate's data pool plus index approach, in which each value is stored only once, there's a sort of deduplication at work here, which saved on database size. My test database occupied around 10GB of disk space.

iLuminate stresses the speedy "time to analytics" that its product enables, as well as its support for ad hoc queries-the freedom to begin digging through data before you necessarily know exactly what you're looking for. I could appreciate this early along in the testing process, as I began digging arbitrarily through the campaign data.

For instance, one of the first tables I imported contained individual campaign contributions for the 1990 to 2008 election cycles. Once I'd loaded the data, I opened the Metadata Explorer within iCorrelate (the primary front-end tool for iLuminate), expanded the list of fields in the table and looked at the statistics item under the Amount field. There, I could see that I had 15.6 million contribution entries with an average amount of $943.62.

While iLuminate stores data in indexed pools, rather than in table structures, the table and column organization from my raw data persisted as logical structures, and I could use the Metadata Explorer to drill down into a particular column and view the data within, as well as check out auto-generated statistics for that column. For example, I found in a few clicks that 12 percent of the individual contribution records in my data set were from California contributors.

To dig a bit deeper into my initial table of data, I turned to iCorrelate's query tool, where I could perform text searches for particular names or words, either in individual columns or across the whole table. For example, I searched for instances of the word "songwriter" in the occupation field of the individual contributions table and turned up 410 records out of about 15 million in 0.8 seconds.

I could drill down into this set of records in the same way that I could for whole columns and tables, quickly pulling out statistical information from iCorrelate. I found that while my data set covered 1990 to 2008, my search turned up results only from 2002 to 2008, which led me to wonder whether the field I'd searched on had been used in the same way over the entire span of my data set.

To find out whether this was the case, I searched for "songwriter" across the whole table, and found 2,229 instances. This broader search took 4.5 minutes to complete. I then narrowed my search to records in the cycles between 1990 and 2000 and, by glancing at the 715 records my follow-up search revealed, learned that the occupational information field I initially chose was less consistently used than another field available in the table, a fact about the data that I noted for my future analysis.

By loading more tables into the repository, and by doing a bit of setup work, I was able to make my campaign data exploration easier and more fruitful. Specifically, I returned to the tool's Metadata Explorer and set out to create some relationships among the individual contributions, candidates and committees tables. The individual contribution table includes a field that identifies the committee or candidate that received each contribution.

I used iCorrelate's expressions feature to create two copies of the recipient ID columns-one to link to the candidates table and the other to link to the committees table. With these links in place, I returned to my songwriter query and used the tool's Get Relations option, which returned all the committee and candidate records that my campaign-contributing songwriter entries pointed to.

Again, I could drill down into the results and view the list of candidates and committees that received contributions from songwriters, as well as view statistics on those results. For example, 86 percent of the candidates that received the songwriter contributions were Democrats.

The Get Relations option has a sibling, Find Relations, which I tacked onto my list of songwriter-supported candidates to find all the records that pointed to those candidate entries. This leg of my search returned 3.9 million records in 5.3 seconds, and, as before, I could pull up assorted statistics about my new set of records.

For instance, the gender breakdown among all contributions to the candidates who were popular among contributing songwriters was 59 percent male to 33 percent female, compared with a 67 percent male to 26 percent female breakdown across the entire set of contribution records.

At each step at which I could view data and statistical information, I could generate typical chart types within iCorrelate, as well as export my data sets to CSV files, create reports or make the data available over ODBC as queries. I could also interact with the data from external applications using SQL, although I did not test this option.