How to Measure Findability in Enterprise Search Solutions

By Sid Probstein  |  Posted 2008-12-09

How to Measure Findability in Enterprise Search Solutions

The results of a recent survey on the findability of information within the enterprise are not encouraging. Roughly half of the responding knowledge workers stated that finding important information was difficult and time-consuming, and that the internal search capabilities provided by their company were "worse" to "much worse" than the equivalent functionality offered to consumers.

Neither of these findings is truly surprising. Corporate Internet sites tend to be directly involved in important and easily measured activities: selling products to new customers (generating revenues) or servicing existing customers (reducing costs). Consequently, Internet search is usually well-funded, staffed and tends to be more successful. Internal search, in contrast, is concerned with productivity-a fuzzier concept that is much harder to measure. Significant investment (and thus success) is therefore harder to achieve.

What is surprising is that roughly half of the survey respondents stated that their company had no formal goal for internal findability. In my view, this is a direct cause for the overall poor results. Companies that don't measure search won't be able to invest appropriately, let alone tune and improve a complicated system with which they likely don't have deep internal experience.

If your organization doesn't measure search, or more precisely, findability, you should start right away. Here are five steps to building a return on investment case for your company's internal search or information access solution. Keep in mind that no massive investment is required upfront. In fact, one person working a few hours a week can make a difference.

Step #1: Talk to your system administration team

For starters, find out if your organization is saving query logs-hopefully they are. If not, this is the first challenge to overcome. Talk to your system administration team and see if they can help. You don't need to save all logs for all time; just try to get your hands on a day or two of data. That's quite enough to get started.

Step #2: Identify and run a handful of queries

Assuming you have some data to look at, identify a handful (less than 50) of interesting queries. Ideally, you want them to fall into a few different categories: one-word, two-word, multiple words, questions, and a few different business domains.

Run the queries yourself and see what comes up. Look at the first few results and score them on a simple scale (such as 1=incoherent, 5=perfect). If you rate a result poorly, spend a few minutes trying to find out what the better answer might be. Then see if you can infer what's wrong. 

For example, does the document that answers your question (but doesn't appear in the results list) contain the terms you put in your query? If it does, you have a relevancy problem; if not, you have some sort of linguistic problem.

Determine and Query Typical Internal Questions

Step #3: Determine and query typical internal questions

Write down some typical internal questions such as "What is the company holiday schedule?" or "Which business unit is responsible for product X?" Then see if you can identify a document that best answers these questions. It might help to pick a domain you are knowledgeable about-at least initially.

Finally, see if you can find the document by querying. It may take you several tries. Keep track of how many times you have to revise the query to get the document, and then score your results similar to the scale mentioned earlier (1=impossible to find, 5=easy to find).

Step #4: Compile and analyze the results

Now compile the results and take a look. Are there any trends in the ratings? The odds are that you will observe one of the following:

1. The results are just incoherent.

This typically indicates that relevancy needs fine-tuning or is otherwise not configured correctly. For example, you may see articles that have nothing to do with your query terms, or you might see new articles but not relevant ones. Or, you may see relevant ones but not the latest information.

2. There are too many results or one source of data dominates the results.

Both of these indicate that the search solution needs to expose facets or "dimensions" that users can use to slice into the result set. It may also need to add entity or concept extraction capabilities.

3. There is misspelling or non-recognition of company terminology, jargon and acronyms

This issue indicates that query and/or content processing-especially linguistic processing such as tokenization and spelling-are not configured correctly, or that some work on acronym and synonym handling is required.

One of the most likely outcomes-regardless of the overall health of your internal search solution-will be that there is simply no appropriate content to find for many queries. One-third of the survey respondents noted this, claiming that less than half of the information they needed is searchable. Most organizations limit internal search to text files such as office documents, spreadsheets, PDFs, brochures and, of course, Web pages (both Internet and intranet).

Unfortunately, this ignores three of the most important corporate silos: e-mail and/or messaging, custom or departmental applications (such as databases), and complex enterprise applications built on top of databases-such as business intelligence, enterprise resource planning (ERP) and customer relationship management (CRM) systems.

Not surprisingly, these are the most challenging silos to work with, let alone link and correlate with the other fuzzier, unstructured data. Legacy enterprise search engines may simply not be up to it. One low-cost, quick and easy fix is to federate user queries against these sources and present the results side-by-side. Even if this doesn't represent a perfect solution, it will at least show users that improvement is possible.

Repeat the Analysis Process Weekly

Step #5: Repeat the analysis process weekly

Getting back to the analysis: Once you have scored a number of queries, looked into the bad ones and tried to understand how hard (or easy) it is to find answers to particular questions, repeat the process a week later-and a week later still. 

Now, armed with about a months' worth of analysis, you should be ready to take the next step and build an ROI case arguing for greater investment with respect to the actual information access challenges faced by your organization.

What if your company already measures search? Then ask a deeper question: are you measuring the search engine or findability? Query logs can only tell you what people are searching for. You can't necessarily infer what they couldn't find. If you conclude that, you may be looking more at the search engine than the user. One concrete step you can take is to interview the top users. They can often tell you what the search solution does well and where improvements are needed.

 Sid Probstein is CTO at Attivio. Sid is responsible for technology strategy and innovation. Sid brings to Attivio more than 15 years experience leading successful engineering organizations and building complex, high-performance systems. Previously, Sid was CTO at GCi, where he headed up development of the company's next-generation commerce platform. Sid also served as VP of Technology at Fast Search & Transfer (now Microsoft), where he developed next-generation search, text mining and multimedia capabilities.

Sid also served as VP of Engineering at Northern Light Technology, where he produced the very first enterprise version of the award-winning search engine. Sid was also Director of Software Engineering at Freemark Communications, where he helped implement the first "free" e-mail service. Sid was also Principal Architect/System Manager at John Hancock Financial Services, where the integrated sales illustration and client management system he designed was featured as a Microsoft Solution-in-Action case study. Sid can be reached at

Rocket Fuel