iLuminate in the Lab
I tested iLuminate on a Sun Microsystems Sun Fire x4200 server with a pair
of dual-core 2.39GHz AMD processors and 8GB
of RAM, running the 64-bit version of
Windows Server 2008 Enterprise.
(iLuminate has a Linux version of its engine in the works for later in 2009.)
For my test data, I used campaign finance data from the Center for
Responsive Politics' OpenSecrets.org
site. The data, which was stored in a series of CSV
(comma-separated values)-formatted text files, detailed campaign contributions,
candidates, political action committees and expenditures spanning the 1990 to
2008 election cycles.
The data set, which added up to about 30 million records, encompassed more
than 545 million values, about 49 million of which were unique. Because of
iLuminate's data pool plus index approach, in which each value is stored only
once, there's a sort of deduplication at work here, which saved on database
size. My test database occupied around 10GB of disk space.
iLuminate stresses the speedy "time to analytics" that its product
enables, as well as its support for ad hoc queries-the freedom to begin digging
through data before you necessarily know exactly what you're looking for. I
could appreciate this early along in the testing process, as I began digging
arbitrarily through the campaign data.
For instance, one of the first tables I imported contained individual
campaign contributions for the 1990 to 2008 election cycles. Once I'd loaded
the data, I opened the Metadata Explorer within iCorrelate (the primary
front-end tool for iLuminate), expanded the list of fields in the table and
looked at the statistics item under the Amount field. There, I could see that I
had 15.6 million contribution entries with an average amount of $943.62.
While iLuminate stores data in indexed pools, rather than in table
structures, the table and column organization from my raw data persisted as
logical structures, and I could use the Metadata Explorer to drill down into a
particular column and view the data within, as well as check out auto-generated
statistics for that column. For example, I found in a few clicks that 12
percent of the individual contribution records in my data set were from California
contributors.
To dig a bit deeper into my initial table of data, I turned to iCorrelate's
query tool, where I could perform text searches for particular names or words,
either in individual columns or across the whole table. For example, I searched
for instances of the word "songwriter" in the occupation field of the
individual contributions table and turned up 410 records out of about 15
million in 0.8 seconds.
I could drill down into this set of records in the same way that I could for
whole columns and tables, quickly pulling out statistical information from
iCorrelate. I found that while my data set covered 1990 to 2008, my search turned
up results only from 2002 to 2008, which led me to wonder whether the field I'd
searched on had been used in the same way over the entire span of my data set.
To find out whether this was the case, I searched for "songwriter"
across the whole table, and found 2,229 instances. This broader search took 4.5
minutes to complete. I then narrowed my search to records in the cycles between
1990 and 2000 and, by glancing at the 715 records my follow-up search revealed,
learned that the occupational information field I initially chose was less
consistently used than another field available in the table, a fact about the
data that I noted for my future analysis.
By loading more tables into the repository, and by doing a bit of setup
work, I was able to make my campaign data exploration easier and more fruitful.
Specifically, I returned to the tool's Metadata Explorer and set out to create
some relationships among the individual contributions, candidates and
committees tables. The individual contribution table includes a field that
identifies the committee or candidate that received each contribution.
I used iCorrelate's expressions feature to create two copies of the
recipient ID columns-one to link to the candidates table and the other to link
to the committees table. With these links in place, I returned to my songwriter
query and used the tool's Get Relations option, which returned all the
committee and candidate records that my campaign-contributing songwriter
entries pointed to.
Again, I could drill down into the results and view the list of candidates
and committees that received contributions from songwriters, as well as view
statistics on those results. For example, 86 percent of the candidates that
received the songwriter contributions were Democrats.
The Get Relations option has a sibling, Find Relations, which I tacked onto
my list of songwriter-supported candidates to find all the records that pointed
to those candidate entries. This leg of my search returned 3.9 million records
in 5.3 seconds, and, as before, I could pull up assorted statistics about my
new set of records.
For instance, the gender breakdown among all contributions to the candidates
who were popular among contributing songwriters was 59 percent male to 33
percent female, compared with a 67 percent male to 26 percent female breakdown
across the entire set of contribution records.
At each step at which I could view data and statistical information, I could
generate typical chart types within iCorrelate, as well as export my data sets
to CSV files, create reports or make the
data available over ODBC as queries. I could also interact with the data from
external applications using SQL, although I did not test this option.









