Corporate Data Guardians Must Ensure 'Value, Veracity' of Big Data

NEWS ANALYSIS: The richest troves of unexplored data can be found inside corporations, but only if the data's value and veracity is protected, said an MIT Sloan School IT professor.

Big Data2b

Add two more Vs to the volume, velocity, variety bywords of big data. Speaking at the eighth annual Massachusetts Institute of Technology Chief Data Officer and Information Quality Symposium, Stuart Madnick told the audience to include value and veracity when considering how to best implement big data projects in their companies.

"There is something big going on," said Madnick, an MIT Sloan School professor of Information Technologies, in a bon mot on both the current surge of interest in big data and the amount of data companies have to manage in big data projects.

Madnick and several of the other speakers at the annual two-day symposium included the big data topic in their presentations. But the event, which predated the current interest in big data, also delved into the changing role of chief data officers and the role of data quality.

The bottom line: There remains a lot of existing internal government and corporate data projects that need to be accomplished before moving on to flashy new projects involving big data, delving into lakes of unstructured data and deploying new vendor hardware.

Interoperability, the development of standards, a mobile orientation and the use of cloud services are all on the agenda to create a "government as a platform." But those developments can be less about technology and more about defining and refining the business process, according to a group of government data officials on a panel on data standardization.

As Madnick said in his presentation on big data, the value and veracity of data becomes especially important when the vast troves of outside data streams are brought into a company's data analysis sphere. "We have more and more data we know less and less about," said Madnick.

He noted it was naïve to think that with so much new data available it would be acceptable that some of the data could be erroneous. Those outside sources of data often don't have the same rigid data-cleansing processes used internally by companies. Small amounts of erroneous data can upend the biggest data projects.

While those vast sources of outside data are appealing, it is within a company's internal data archives, which was often not part of the data collection and analysis process, where the real business intelligence resides.

In one of the most interesting presentations of the symposium, William Inmon, president and CTO of Inmon Consulting, explained how textual data in the form of call center logs, medical narratives and bureaucratic report files could be transformed into a corporate business intelligence treasure trove.

Inmon uses a process called textual disambiguation to analyze both the content and context of documents to be transferred to databases for querying and reports. Examples included call center logs analyzed for patterns by date and keywords, real estate filings for sales trends, and medical report narratives for health and wellness issues.

"Analytics on text is so new very few people are doing it, but in terms of business relevancy, text gets at the heart of business relevancy," said Inmon. "We are not looking at data on the Internet, but data in the corporation, which has much greater relevancy of information inside the corporation in terms of immediate value," said Inmon. That value is only derived from systems that analyze not just the content of documents, but the context of the words within the document.

The chief data officer is still a named role found in only a fairly small amount of larger companies, but the process of collecting, analyzing and protecting data is required in every company and government organization.

C-level executives—including CTO, CIO, CDO and CEO—were heavily represented among the symposium audience. The future belongs to those executives, whatever the title, who can look beyond corporate structural silos and see themselves as guardians of a company's most valuable asset—its data.

Eric Lundquist is a technology analyst at Ziff Brothers Investments, a private investment firm. Lundquist, who was editor in chief at eWEEK (previously PC WEEK) from 1996-2008, authored this article for eWEEK to share his thoughts on technology, products and services. No investment advice is offered in this article. All duties are disclaimed. Lundquist works separately for a private investment firm, which may at any time invest in companies whose products are discussed in this article and no disclosure of securities transactions will be made.