Structure the Unstructured

By Frank Brown  |  Posted 2010-01-22 Print this article Print

Structure the unstructured

The insights researchers are able to gain when conversing informally are extremely rich, because human brains are adept at making contextually-relevant associations of which a structured database is incapable. For example, a human would know immediately that the words "auto," "automobile" and "car" mean the same thing, or that a past experiment may be "kind of" similar to the one being conducted in a current project. This is what the lunch table of the past delivered.

But what happens when your organization's head pharmacologist is in Boston, the lead chemist is in Beijing, and the available information base involves an enormous breadth of sources and data formats? Those contextually-relevant associations are not so easy to make.

Until organizations are able to "structure" (that is, categorize) the vast quantities of unstructured content at their disposal, they will miss out on a monumental amount of knowledge. This is where less rigid categorization technologies such as advanced semantic search and text analytics come in. But they have to be sophisticated enough to handle the highly-complex nature of scientific data. For instance, a molecule may be represented by name, by an ID number or as an image, so your search solution must be "scientifically aware" enough to recognize these variations.

Consider a company that needs to search a vast amount of unstructured content, ranging from external patents and journal articles to their own internal documents and research databases. The company needs to identify and extract information relevant to a key project. Using a scientifically-aware text analysis application capable of recognizing chemical structures and biological sequences, researchers would be able to query the content and quickly pinpoint the most relevant information. They would be able to do this without having to know exactly how the data is represented. Without this capability, the time and cost constraints involved in leveraging unstructured content would be too high and, most importantly, critical insights would be missed.

Frank Brown, PhD, has served as Senior Vice President and Chief Science Officer for Accelrys since October 2006. Frank has extensive experience in the areas of computational chemistry and chemoinformatics. He is responsible for both the scientific direction of the company and all collaborative research with academic, government and industrial partners. Prior to joining Accelrys, Frank held positions of increasing responsibility at Johnson & Johnson, most recently as senior research fellow within the Office of the CIO. In this position, Frank oversaw the development of architecture for all R&D in the organization's pharma sector. Before Johnson & Johnson, Frank started the first chemoinformatics group in the industry at Glaxo Research Institute, and launched software products targeted to the pharmaceutical industry as vice president for product and business development at Oxford Molecular Group. Frank has also served as an adjunct associate professor in the Department of Medicinal Chemistry, School of Pharmacy, at the University of North Carolina at Chapel Hill. He has also served as a chair for the American Chemical Society (ACS), Computers in Chemistry section, and on an NIH Special Study. Frank holds a PhD in physical organic chemistry from the University of Pittsburgh and a post-doctoral studies degree in bio physics from the University of California at San Francisco. He can be reached at

Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters

Rocket Fuel