Structure the Unstructured
Structure the unstructured
The insights researchers are able to gain when conversing informally are extremely rich, because human brains are adept at making contextually-relevant associations of which a structured database is incapable. For example, a human would know immediately that the words "auto," "automobile" and "car" mean the same thing, or that a past experiment may be "kind of" similar to the one being conducted in a current project. This is what the lunch table of the past delivered.
But what happens when your organization's head pharmacologist is in Boston, the lead chemist is in Beijing, and the available information base involves an enormous breadth of sources and data formats? Those contextually-relevant associations are not so easy to make.
Until organizations are able to "structure" (that is, categorize) the vast quantities of unstructured content at their disposal, they will miss out on a monumental amount of knowledge. This is where less rigid categorization technologies such as advanced semantic search and text analytics come in. But they have to be sophisticated enough to handle the highly-complex nature of scientific data. For instance, a molecule may be represented by name, by an ID number or as an image, so your search solution must be "scientifically aware" enough to recognize these variations.
Consider a company that needs to search a vast amount of unstructured content, ranging from external patents and journal articles to their own internal documents and research databases. The company needs to identify and extract information relevant to a key project. Using a scientifically-aware text analysis application capable of recognizing chemical structures and biological sequences, researchers would be able to query the content and quickly pinpoint the most relevant information. They would be able to do this without having to know exactly how the data is represented. Without this capability, the time and cost constraints involved in leveraging unstructured content would be too high and, most importantly, critical insights would be missed.









