The concept of categorization is based on standardization—that is, all content entering or in a system should be categorized according to a standard taxonomy. Standards for categorization are another story, however.
Other than the use of the Dublin Core Metadata Initiatives Dublin Core (www.dublincore.org) for metatags, standards in this area are not widely used. As with many other areas, this isnt because the standards dont exist, but because relatively few organizations use whats available.
By far, the most promising standard when it comes to categorization and taxonomies is the World Wide Web Consortiums RDF (Resource Description Framework). RDF, which is at the core of Web creator Tim Berners-Lees vision of the semantic Web, is the main standard for defining metadata in Web documents.
Simply put, the XML-based RDF makes it possible to define key elements within a document. Given that categorization engines spend much time trying to figure out what documents are actually about, the use of RDF could greatly facilitate that task.
However, to use RDF, content must be properly tagged in the first place. And outside of some government and educational institutions, RDF has yet to see the kind of wide adoption that XML has experienced.
Many vendors eWeek Labs has spoken with—both the vendors of the products we review and others—said that while they had investigated RDF, they wouldnt support it until a large corporate client asked for it. The fact that wide use of RDF could make some categorization products irrelevant could also account for this hesitancy.
RDF isnt the only standard that could ease categorization and taxonomy creation. The DAML (DARPA Agent Markup Language) program is similar to RDF but goes a little bit further. DAML makes it possible to apply a wide variety of semantic concepts to documents and even assign equivalent and unique concepts.
A related standard, DAML+ OIL, or Ontology Inference Layer, adds ontology features to DAML, making it possible to apply common concepts and meaning to data within documents. This standard is at the core of the recently formed Web Ontology Working Group at the W3C, which is working toward defining a standard for creating and managing ontologies within documents.
All these standards are being built and designed with the semantic Web in mind, but, if adopted, they could potentially revolutionize categorization, making it more effective and easier to apply.
For more information on RDF, go to www.w3.org/rdf. For more information on DAML, go to www.daml.org.
East Coast Technical Director Jim Rapoza can be reached at [email protected]
Other Articles in this eValuation:
- Data By Design
- Reviews: Three Paths to Sorting Content
- eVal Scorecard: Content Categorization