What if you could instantly build categorized search results from multiple content sources as searches were performed? That is the challenge Vivisimo Inc. has taken on with its Clustering Engine 4.0. Vivisimos solution isnt perfect, but Clustering Engines ability to create on-the-fly categories is so impressive that we gave it an eWEEK Labs Analysts Choice award.
Click here to read the full review.
2
What if you could instantly build categorized search results from multiple content sources as searches were performed? That is the challenge Vivisimo Inc. has taken on with its Clustering Engine 4.0. Vivisimos solution isnt perfect, but Clustering Engines ability to create on-the-fly categories is so impressive that we gave it an eWEEK Labs Analysts Choice award.
When it comes to making it easy for visitors to find content on sites and search engines, good categorization is a must.
Much of the early success of Yahoo.com, in fact, was based on its ability to categorize the Web.
However, as eWEEK Labs July 2002 Data by Design eValuation found, good categorization of even relatively similar content can be difficult, requiring upfront investment in taxonomy analysis and lots of back-end work in perfecting the categories as new content and category areas are added.
Probably the biggest difference between Clustering Engine and other categorization applications is that Clustering Engine does not require a pre-existing taxonomy, nor does it require training sets of preclassified content. Instead, the application analyzes returned search results and infers categories based on the content.
Unlike some tools that perform categorization automatically, Clustering Engine is not a search engine. The product instead pulls results from any predefined source, such as internal search engines, vendor search engines, custom news services and public Web search engines. It is also possible to directly query databases from Clustering Engine.
Pricing for Clustering Engine starts at $24,000 and climbs depending on several factors, including the number of sources that information is pulled from and deployment requirements. Clustering Engines price compares favorably with those of enterprise search systems and categorization platforms.
In our tests, the category clusters returned in searches were accurate more often than not, especially when the search terms were very specific. When search terms became broader, the returned categories were often comical. (A search for a certain drink recipe, for example, returned a womens footwear category.)
Still, given its ability to provide good categorization of searches on the fly—with much less upfront work than other approaches require—companies looking to improve search results in their portals, Web sites and intranets should definitely take a look at Clustering Engine. Indeed, one of the main reasons behind our decision to award Clustering Engine an Analysts Choice is that the product makes it possible for companies to deploy features that have typically been limited to large public search engines.
Next page: Complex Administration
3
Clustering engine runs on Linux and Windows servers, and we found that installation on both was simple.
Clustering Engine now features a Web-based administration tool, and it includes helpful tutorials as well as hundreds of predefined search sources.
However, while the Web-based administration tool was appreciated, it wasnt user- friendly. In many ways, it felt like a browser-based substitute for manually editing XML files. (In fact, users can choose to do exactly that instead of using the Web-based interface.) Still, it does centralize the management of the application and helped with certain tasks such as testing content sources that we created.
In general, administration of the application itself could be extremely complex and time-consuming. However, this is because pretty much every aspect of the product is exposed and can be customized in almost any way possible. Once mastered, Clustering Engine can be managed in a way that best suits a company and its customers.
The Web-based administration tool does provide access to the tutorials and documentation, which is a must for connecting Clustering Engine to your content sources and for integrating it with your applications.
We ran into trouble when we tried to quickly run through the documentation and add content. When we went through everything step by step, however, we were able to connect to any kind of search form and retrieve results from multiple sources.
Creating a source from our own search engines or from external search sites that we wished to leverage ranged in ease of use from very simple to highly complex.
On the simple side of the scale, if our search site used a standard HTTP GET protocol, we could simply copy the URL from our browser window to get most of the parameters. On the complex side, where a search engine used customized, under-the-covers scripting, getting all the right information meant digging through the code of the search application. In those cases, the many sample sources included with Clustering Engine proved invaluable.
An extremely useful feature when setting up search sources is the ability to create knowledge bases. The knowledge bases essentially made it possible to tweak categorization to avoid common words or phrases that could skew results (for example, removing “eWEEK” as a relevant phrase on searches of eWEEK-only content). Using the knowledge base feature, we could also define things such as acronyms and synonyms to make sure all related content was properly categorized.
Clustering Engine consists mainly of XML files and Common Gateway Interface scripts, so it is possible to easily integrate it into almost any system or application. Vivisimo includes excellent API information for integrating Clustering Engine with a variety of programming languages and systems. Vivisimo also includes detailed information on the XML input and output of the product, which allows for some very advanced customizations.
Labs Director Jim Rapoza can be reached at jim_rapoza@ziffdavis.com.