Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Applications
    • Applications
    • Database
    • Networking
    • Storage

    How to Deal with Unstructured Data – Oh Brother, Where Art Thou?

    Written by

    W.H. Inmon
    Published March 21, 2008
    Share
    Facebook
    Twitter
    Linkedin

      eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

      /images/stories/70×50/bug_knowledgecenter_70x70_(2).jpgUnstructured data has been around for a long time – certainly longer than the computer. Consider the Bible, the Egyptian hieroglyphics, and the Kama Sutra. They long predated silicon chips. And search engines have been around for a while as well – but not as long as the printed word. When it comes to unlocking the valuable information contained in unstructured data, even with sophisticated search engines, the world really has not come very far. So, why would this be the case?

      Garbage In, Garbage Out

      There is a missing ingredient that needs to be present in order for search engines to unlock the real value of unstructured data. To help explain that missing ingredient, consider the oldest information technology conundrum of all: GIGO or “Garbage In, Garbage Out.” What happens when a powerful search engine is used against textual data that is essentially unscrubbed, unwashed and unintegrated? The answer is that the result of the search engine’s work, which is returned to the end user, is also unscrubbed and unwashed.

      In order for a search of text to be really powerful, the text that the search is conducted on needs to be integrated before the search is done. Once it is completed, you won’t start with garbage in – and you then wouldn’t expect garbage out.

      Internet vs. Corporate Data

      In the case of searching the Internet, scrubbing the data is a little bit of a reach. Trying to scrub and integrate data across the Internet is probably a futile endeavor. To attempt to integrate data on the Internet is roughly like trying to boil the ocean or, at least, Lake Erie.

      But corporate data is another matter for two reasons. First, when it comes to corporate data, there is a finite amount of it – as opposed to the almost infinite amount of Internet data out there. And second, unlike Internet data, corporate data is almost all relevant to the business of the corporation. It is safe to say that only a small part of the data on the Internet is relevant to the business of any one corporation – even a large and diverse corporation such as IBM or Dow Chemical.

      Therefore, integrating corporate textual data, or preparing it for the purpose of search and analysis, is a very real and very practical possibility.

      What Kinds of Data Need Integration?

      So what kinds of corporate data need to be integrated? The only limitation to that lies in the imagination of the user. Some of the obvious kinds of corporate data that could be integrated include:

      1. Customer data – relating to customer communications

      2. Safety data – relating to accidents, inspections, repairs, warranties and other important events

      3. Contracts data – data relating to the specific contracts of the corporation

      4. Discovery data – data in the litigation process

      5. Compliance data – descriptions of sensitive corporate events and transactions, etc.

      Therefore, there are few limits – or, theoretically, no limits at all – to the potential uses of integrated corporate textual data.

      The Advantage of Data Integration

      One of the major advantages of integrating corporate textual data is that, once it’s integrated, it can then be put into a database and reused. In other words, the corporate textual data need only be integrated once. Thereafter, it can be researched and reanalyzed as often as desired.

      Typically, after the corporate textual data is integrated, it is placed into a data warehouse. Once inside, it is able to be combined with other structured data in the data warehouse. In doing so, an entirely new class of queries is created.

      The query can be called a hybrid query because the query passes against both structured and unstructured data. And the resulting data warehouse is truly an integrated data warehouse – containing both data whose origin is structured and unstructured.

      Customer Communications Analysis

      As an example of just one of the many applications that open up to the corporation, consider customer communications analysis. It is normal to receive e-mails from customers. But, once those e-mails are read, normally they are lost. They go into a holding file and just sit there – along with thousands of other e-mails.

      The problem is, when the corporation needs those communications, they are difficult to find. This becomes especially important when future communications occur with the customer.

      The Case of Mrs. Jones

      To illustrate this point, let’s look at the case of a customer named Mrs. Jones. Let’s suppose she wrote a scathing e-mail last month because an order of hers had been botched. This month, our salesperson wants to call up Mrs. Jones and solicit some more business. Is it important that the salesperson knows about last month’s e-mail from Mrs. Jones?

      The answer is, of course it is important. If we want to sell Mrs. Jones something new, ANY recent direct communication is important – whether it is directed at or received from Mrs. Jones. So, how does the corporation find and filter the e-mails that are relevant? Also, how does the corporation find and filter the irrelevant e-mails?

      This, then, is just one example of the many, many cases where unstructured textual data could be used – if, in fact, that corporate textual data had been placed in a database once it had passed through an integration process designed specifically for textual integration.

      /images/stories/heads/inmon_bill70x70.jpg W. H. Inmon is considered the father of data warehousing. He has written 49 books, translated into nine languages. His book on data warehousing has sold approximately 500,000 copies around the world and is in its fourth edition. W.H. Inmon founded and took public the world’s first company to build and sell ETL. He has written more than 600 articles and is published in most major trade journals.

      W.H. Inmon has conducted seminars and spoken at conferences on every continent except Antarctica. He holds nine software patents. His latest company is Forest Rim Technologies Inc., a company dedicated to the access and integration of unstructured data into the data warehouse environment. His weekly newsletter is one of the most widely read in the industry, having 75,000 weekly subscribers.

      (For more information about the corporate usage and structuring of unstructured data, refer to W.H. Inmon’s recently published book, Tapping into Unstructured Data: Integrating Unstructured Data and Textual Analytics into Business Intelligence, Prentice Hall, 2007, by W.H. Inmon).

      W.H. Inmon can be reached at whinmon@msn.com.

      W.H. Inmon
      W.H. Inmon
      W.H.Inmon is considered the father of data warehousing. He has written 49 books, translated into nine languages. His book on data warehousing has sold approximately 500,000 copies around the world and is in its fourth edition. W.H. Inmon founded and took public the world's first company to build and sell ETL. He has written over 600 articles and is published in most major trade journals.W.H. Inmon has conducted seminars and spoken at conferences on every continent except Antarctica. He holds nine software patents. His latest company is Forest Rim Technologies, Inc., a company dedicated to the access and integration of unstructured data into the data warehouse environment. His weekly newsletter is one of the most widely read in the industry, having 75,000 weekly subscribers. (For more information about the corporate usage and structuring of unstructured data, refer to W.H. Inmon's recently published book, Tapping into Unstructured Data: Integrating Unstructured Data and Textual Analytics into Business Intelligence, Prentice Hall, 2007, by W.H. Inmon).W.H. Inmon can be reached at whinmon@msn.com.

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      MOST POPULAR ARTICLES

      Artificial Intelligence

      9 Best AI 3D Generators You Need...

      Sam Rinko - June 25, 2024 0
      AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
      Read more
      Cloud

      RingCentral Expands Its Collaboration Platform

      Zeus Kerravala - November 22, 2023 0
      RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
      Read more
      Artificial Intelligence

      8 Best AI Data Analytics Software &...

      Aminu Abdullahi - January 18, 2024 0
      Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
      Read more
      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Video

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2024 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×