SAS Digging Into Unstructured Data
SAS Institute Inc. will use technology from Inxight Software Inc. to power much of its forthcoming SAS Text Miner application.
As SAS makes a move into unstructured data analysis, it will add to Text Miner Inxights LinguistX Platform, for natural language text analysis, and Inxight Thing Finder, software that identifies and extracts content from documents.
Text Miner, which is scheduled for release at midyear, will be used to analyze textual or unstructured data from databases, e-mail and the Web. Most data analysis products today, from SAS and other vendors, focus on numerical or structured data. Yet unstructured data accounts for 85 percent of a companys information, according to Inxight.
Randy Collica, a senior business analyst in sales and marketing for Compaq Computer Corp., has tested early betas of Text Miner and said he is looking forward to the new product. Compaq, which already uses SAS Enterprise Miner software for structured data analysis, plans to use Text Miner to analyze notes made about customer calls by agents in its call center, contents of customer service e-mail messsages and textual content from a D&B Inc. prospecting database used for marketing purposes, Collica said.
"Its part of our plan to utilize the areas [of data analysis] where traditional data mining tools have a difficult time," said Collica, in Littleton, Mass.
Text needs to be massaged just as much as other data to be useful, such as with data cleansing and transformation, Collica said. He cited the natural language analysis capabilities SAS was adding from Inxight as an example.
SAS agreement with Inxight includes the support of English, French and German, with other language support available separately from Inxight, of Santa Clara, Calif.
In addition, Text Miner will include SAS analytical capabilities, including new clustering algorithms designed for text mining, said SAS officials in Cary, N.C. The software will integrate with Enterprise Miner for cross-analysis of text and numerical data and will support predictive, as well as descriptive, analytics.