Three Paths to Sorting Content

Content categorization doesn't seem that hard-that is, until you start doing it.

Content categorization doesnt seem that hard—that is, until you start doing it. When human beings can disagree about where content should be categorized in even a small taxonomy, the task starts to seem impossible for software.

The good news is that all three of the programs eWeek Labs tested in this eValuation provided excellent results, both in our labs-based tests and in the work that the vendors did with the content. This is especially impressive given that the performance of all products in this area tends to improve over time.

We tested Applied Semantics Inc.s Auto-Categorizer 1.1, Interwoven Inc.s MetaTagger 3.0 and Thunderstone Software LLCs Texis Categorizer 4.1. Although each of these products is very different in its design and in its approach to categorization, we were able to focus on key areas that will concern businesses implementing a categorization system.

We looked at each products ability to import, edit and possibly create taxonomies, how each product deals with legacy and new content, how categorization is trained and refined, and how each product integrates with other enterprise systems.

In many ways, each of the products that we tested represents a different categorization approach. Businesses interested in categorization should look at each product not only as a stand-alone system but also as a representation of different means to a categorization end.

For example, Interwovens MetaTagger 3.0 works only with the Interwoven TeamSite content management platform. However, organizations that are not running TeamSite might still find that the approach of integrating categorization with a content management system would best meet their needs.

For other companies, the open approach of Texis Categorizer 4.1 will provide the best opportunity to easily integrate categorization with a wide number and variety of applications and sites.

Auto-Categorizer, meanwhile, offers an ontology-based approach that, while currently limited to areas that the ontology addresses, provides a highly focused and flexible method of creating accurate categorizations.

Whatever the approach, there are several steps that businesses can follow to take some of the pain out of implementing categorization systems. Its extremely important, for example, to know where all content is stored and how it is generated. Some content can be categorized after it is created, but in some cases it might work better to suggest categories during the authoring phase.

Most content categorization system vendors will send analysts to assist organizations in creating taxonomies. This can be extremely helpful, especially for businesses that have no in-house expertise or those that dont tend to classify their content in a standard way. However, its important for IT managers to come to such a meeting with at least a basic plan in place. Otherwise, youll end up with the same taxonomy built for other companies similar to yours.

Finally, businesses should know which of their systems will need to integrate with the categorization platform. Both the technical and design aspects of categorization vary radically depending on the systems involved.

In addition, the extent to which a product supports open standards and common development languages will greatly affect the ease with which the product can be integrated with other applications.