How to Discover the Hidden Meaning in Unstructured Data

BlueCross BlueShield has a lot of data, and not all of it is designed for easy access. But, according to Frank Brooks of BlueCross BlueShield of Tennessee, with a few good tools and a lot of creativity, it can be done. He explains how.


As with many large organizations, BlueCross BlueShield of Tennessee (BCBST) stores several different types of data in various formats. The data is stored in a diverse group of systems. These systems include relational databases, mainframes, content management systems, e-mail servers, call center contact notes, as well as CRM applications and business intelligence reports. Only part of this data is available in conventional formats that are easily managed by the relational database.

BCBST is not alone in confronting the challenge of accessing heterogeneous, diverse data. According to a May 2007 Data Warehouse Institute survey of 370 respondents, just 47 percent of a typical organization's data is structured. Access to a consolidated view of the data from many diverse sources is not a trivial undertaking. Understanding the meaning of the other 53 percent of data is a significant challenge that many organizations have tried to address with mixed results - until now.

New Tools and Technologies Unlock Data

As part of an ECM (Enterprise Content Management) strategy, BCBST began researching ways to unlock the meaning in unstructured data. We looked for ways to then marry it with related, structured data to enable users to easily access pertinent information from many different data sources. One of the goals of the ECM strategy was to provide a single, up-to-date view of information related to a specific subject - using new data integration tools and technologies such as enterprise search and text analytics.

BCBST implemented a POC (proof-of-concept) project in 2007 that focused on two strategies to combine related structured and unstructured data. The first strategy includes transforming data into content. This is done by incorporating structured data and reports into a searchable index, which allows employees to access the content through a common search interface.

The second strategy turns content into data by expanding the data available for reporting. This is done using text analytics to extract key elements or concepts from the content. The data is then transformed and loaded into the fields of a database record. This supplemental data can then be joined with existing structured content and accessed via standard SQL queries.

Data Integration and Evaluation

We used IBM Cognos 8 BI Go! Search, which we integrated with IBM OmniFind Enterprise Edition, to enable users to easily access the results of text analytics processing by SAS Text Miner. We evaluated a second option where IBM OmniFind Analytics Edition was used as a comprehensive solution to transform unstructured text into meaningful business insight. It could then be analyzed in an interactive, Web-based text mining interface.

Potential uses of text analytics technology include the ability for call center management to analyze the trends, patterns, meaning, correlations and insight in the large volume of free-form comments entered by customer service call center staff. This capability also enables the measurement of changes in customer sentiment over time or across other dimensions.

The result of this processing provides decision-makers with the tools they need to quickly identify service issues across a large number of structured dimensions. Combining traditional BI (Business Intelligence) reporting and analytical tools with new search capabilities enables easy access to and consumption of this new information. It also enables easy integration with existing analytical solutions that would not be possible otherwise.

Unstructured Data Becomes Structured

Furthermore, text analytics software can identify high-risk members based upon free-form text entered into the comments field of a case management application. This text, which previously lacked analytical value, can now be transformed into new structured data. It can be combined with traditional structured data (such as paid claims dollars and disease type classification codes) that is stored in the data warehouse. It can then be fed into a data mining and predictive analytics tool.

The end result is a new set of information displayed in an easily understood bar chart. It can serve to quantify the difference in claims dollars between members classified as "having risk" vs. "having no risk" for each of four different disease classifications. The risk/no risk information would have been obtained from an analysis of the unstructured data.

Data Unlocked and Made Useful

Leveraging new technology, such as enterprise search and expanded BI capabilities, can allow BCBST to discover, analyze and report on the meaning of previously inaccessible unstructured data. It can also add new insight to the meaning of existing structured data used for analytic differentiation. Questions could be answered with ease and agility. For example, such questions include: "What issues are preventing claims from being paid in 30 days?" or "What questions are being asked by providers regarding overpayments and underpayments?" or "Why are provider reviews being requested?"

By increasing the speed and access to all of the structured and unstructured business-critical information that users have about our providers, BCBST is able to more effectively manage costs and improve decision-making. Further, given that analytics have become increasingly important as a competitive differentiator for healthcare insurance companies, the combined use of BI and text analytics technology allows BCBST to respond more effectively to RFPs to acquire new business and retain existing clients.

Frank Brooks is Senior Manager of Data Resource Management and Chief Data Architect at BlueCross BlueShield of Tennessee (BCBST), where he has worked for 21 years. His responsibilities include overseeing the Database Administration, Data Integration and Business Intelligence departments. He is also heavily involved in the planning and strategy of the Information Management area, including implementing an Enterprise Architecture function.

Brooks has spoken at conferences including the annual Cognos user conference, Cognos Forum in 2006 and 2007, and at the InformationWeek CIO Conference in 2006. He has participated in a number of IT publication articles, press releases, case studies by Gartner and IDC, and delivered an IBM/Cognos-sponsored worldwide webinar on the information management practices in his area.

BlueCross BlueShield of Tennessee, Inc. (BCBST) is an independent licensee of BlueCross BlueShield Association.