How to Improve the Efficiency of Enterprise Search

By Yves Schabes  |  Posted 2009-07-14

How to Improve the Efficiency of Enterprise Search

Searching for a document in the workplace usually involves sifting through multiple pages of search results, which wastes time and money. And because enterprise users are searching for specific information-not just the most popular answer-they expect more precise search results than they would get from the Internet. Simply put, the techniques that work well on the Web aren't as well-suited to enterprise search.

A recent study noted that "49 percent of survey respondents agreed or strongly agreed that it is a difficult and time-consuming process to find the information they need to do their job." All search, whether on the Internet or in an enterprise, is powered by metadata-the data about data.  Metadata is traditionally known as being the recorded information describing the different parts of data such as names, sizes, lengths, etc.

The reason why finding information on the Internet is faster and more accurate than on an enterprise site is because hyperlinks provide the Internet with naturally occurring, high-quality metadata. Metadata is generated through Internet hyperlinking. Each time someone links text to a Web page, the linked text is interpreted by Internet search engines as metadata about this particular page, thus impacting a page's ranking on the Web search results.

However, searching on an enterprise site is a more difficult and laborious process because of the lack of metadata. Unlike the Internet, there are no textual links between documents in an enterprise, and no implicitly created metadata that a search engine can use. Office documents are not naturally linked together, and there are too few corporate librarians assigned to manually tag each office document with the appropriate metadata.

Also, an average enterprise has more content types, formats and security measures than the Internet, which makes this search more time-consuming. The result is that workers spend too much time sorting through pages and pages of irrelevant results, as opposed to executing the tasks associated with their job.

The bottom line for enterprises with large amounts of data is that it's worth investing in the right search technology. The solution should automatically categorize enterprise content, therefore eliminating time-intensive, manual categorization. In an organization with hundreds, even thousands of workers outputting knowledge, it could take years to tag each employee's electronic documents by hand.

How to Achieve Better Enterprise Search

How to achieve better enterprise search

By automatically tagging and categorizing enterprise content, enterprises will realize the naturally occurring, high-quality metadata associated with hyperlinks, and bridge the gap between enterprise and Internet search. In order to achieve better enterprise search, enterprises should do the following three things:

1. Install a system to automate the creation of metadata for existing content and new content as it's added to the server.

2. Categorize information into logical groups based on folksonomies, taxonomies and ontologies.

3. Define, in advance, what you want to understand from the documents and check to see that automated systems coincide with these goals. Perform a systematic human check of your automated search and content management tools at least once per quarter.

The key to managing your company's content quickly and easily is being able to automatically generate metadata. An auto-categorization metadata system, the backbone of a successful content management system (CMS), is a proven solution for better search and retrieval. It not only improves accuracy and efficiency, but also saves time, money and resources. In today's challenging economic climate, it's hard to argue with that.

Dr. Yves Schabes is President of Teragram. Yves co-founded Teragram with Dr. Emmanuel Roche in 1997. Yves has spent the past fifteen years working on issues relating to natural language processing and computer science. Yves is the author, or editor, of more than fifty international scientific publications, including co-editor, with Emmanuel Roche, of Finite-State Language Processing (1997, MIT Press, Cambridge MA). Yves is also an Associate to the Division of Applied Science at Harvard University.

Prior to founding Teragram, Yves was a Senior Scientist at Mitsubishi Electric Research Laboratories in Cambridge, MA. He also held a position as a Research Associate at the University of Pennsylvania. Yves has been a program committee member of many international scientific conferences and journals.

He received a Ph.D in 1990 in Computer Science from University of Pennsylvania and a M.S. in Electrical Engineering from l'Ecole Sup??«rieure D'Electricit??« (France) in 1985. He can be reached at

Rocket Fuel