The World Bank, a $20 billion financial arm of the United Nations, employs categorization technology to help it meet its goals of funding education and increasing the standard of living in 184 developing countries.
“With 60 years worth of historical and current documents, we have a very complex and rich information environment,” said Denise Bedford, a senior information officer at the World Bank, based in Washington.
“We do development work around the world, which means that information is being created and captured in many locations and in many languages. This can be a complex environment in which to manage information.”
To better organize, store and process its voluminous archives, the bank relies on Teragram Corp.s TK240 categorization software, paired with a smart strategy for collecting metadata. The combination has dramatically improved the flow of information. Before using categorization technologies, World Bank employees processed three electronic documents per hour; today, they have the ability to process as many as 50,000 electronic documents per hour.
The World Bank collects data for more than 30 areas—such as education, water supply and sanitation, and agriculture—for its development efforts and general tracking of human conditions worldwide. Information collected ranges from critical project and loan data to e-mail, financial studies and historical research articles.
To simplify its complex workflow, the World Bank 10 years ago implemented an intranet that enabled workers to categorize their resources. The plan called for staff to fill out a template to provide metadata that the bank could use to categorize information.
However, the result was a mass of unorganized, unsearchable metadata, Bedford said. “We created huge translation problems … because we created metadata in English for French, Spanish and Portuguese documents, which means search in those languages [was] impossible,” she said.
In 2002, the World Bank authorized Bedford to begin an RFP (request for proposal) process for an off-the-shelf solution that not only captured useful metadata but also classified and summarized documents. The bank selected Teragrams TK240 categorization solution, citing the simplicity of its interface and its ability to perform contextual searches in multiple languages.
The World Banks institutional profile includes descriptions of metadata the organization wants to capture, as well as indicators such as topic, document keywords, loan numbers and more.
World Bank employees send documents to a homegrown document management system that uses Oracle Corp.s InterMedia database feature, which stores and manages multimedia data. The TK240 software then reads the document, applying language identification to it and assigning metadata tags based on predefined indicators, searching for more than 180,000 keywords in 1,000 categories.
The TK240 summarization engine searches the document for conceptual identifiers before assembling an XML copy of the document with the metadata tags that resides in tandem with the original document in the document manager. The authors inspect finished documents and save them to the InterMedia database.
The bank recently used TK240 to categorize approximately 61,000 documents at the World Bank Institutes Library of Learning, in Moscow. Those documents included electronic course content as well as information published to the World Banks intranet and to its external Web site. The bank is set to begin processing some 3.6 million documents in the Library of Learnings records management system, Bedford said.
The World Bank may use the categorization software to tackle e-mail in the future, Bedford said.
Senior Writer Anne Chen can be reached at [email protected].