Read-Only? Not Anymore

 
 
By eweek  |  Posted 2001-07-16 Email Print this article Print
 
 
 
 
 
 
 

OmniPage Pro upgrade's new OCR capabilities take document management to a new level

Scansoft Inc.s OmniPage Pro 11, with its ability to parse PDFs and flow text from one column to another, is a worthwhile upgrade. The softwares new OCR capabilities will free users from the chore of searching printed documents by hand and, of course, save rekeying of the printed page. And it will save documents as PDF files.

OmniPage Pro 11, released for PCs last month (the Macintosh version is due in November), is the first revision since ScanSoft last year absorbed competitor Caere Corp. The software costs $499; upgrades from competing products cost $149.

OmniPage Pros most noteworthy new feature is its ability to edit and save PDF files as PDF, HTML or Microsoft Word documents, or in any number of common text and image formats.

eWeek Labs hasnt seen any other programs that can do this, and for sites that need these capabilities, OmniPage Pro is well worth the price.

OmniPage Pro now allows type to flow in columns, as in page layout programs such as Quark Inc.s QuarkXPress or Adobe Systems Inc.s InDesign.

OmniPage Pro recognizes and retains type and background colors. Voice feedback can be enabled. Tables are recognized with or without a printed grid. In addition, the program now recognizes 114 languages based on the Latin, Greek and Cyrillic alphabets.

Working in the zone

OmniPage Pro 11 can do all this automatically, but when we fed it a PDF of an eWeek cover rich with text, graphics and graphics portraying text, we discovered that complex layouts are recognized more accurately if the user defines three classes of information, or zones—a simple process that yielded practically edit-free results.

That is, auto-recognition was not sufficiently accurate among the three zones (alphanumeric, graphic and tabular) in a visually complex layout. However, the manual process of defining zones allowed for almost-perfect parsing of the document and took us less than a minute.

If OmniPage Pro is unsure of its OCR (optical character recognition) decisions, it queries the user after the recognition process. In most test cases, its self-doubt was unfounded. Furthermore, it learned from its questioning and applied the answers to other questions within the same document.

To gauge OCR accuracy, we had OmniPage Pro 11 scan a straightforward, 12-page Word document in 12-point Times Roman font that had been laser printed at 300-dot-per-inch resolution. To further challenge the acuity of the OCR, the document was in Spanish, replete with accents and tildes.

After the OCR process, a proofing window opened automatically and asked us roughly a dozen questions. Only four of them reflected uncertainty about optical recognition, and OmniPage Pro 11 guessed accurately on three of those occasions. The remainder were grammatical queries that would have come up in any spelling checker.

On English documents, the performance was no less impressive.

There is more to accurate recognition than identifying ps and qs. If the font of the target document is not resident in system software, OmniPage makes the next best choice. The software ably recognizes font and point size; users can ensure 100 percent accuracy by designating the appropriate font within OmniPage Pros Font Matching option—provided its resident in the system software.

OmniPage Pro 11 enabled users to generate custom templates that, along with batch processing, will facilitate one-click scanning, recognizing, proofing (if needed) and saving of documents in a preselected format and directory. For a news organization or public relations company that must scan multiple examples of a given layout—perhaps a weekly or daily bulletin or news clippings—such streamlining is a real timesaver.



 
 
 
 
 
 
 
 
 
 
 

Submit a Comment

Loading Comments...
 
Manage your Newsletters: Login   Register My Newsletters























 
 
 
 
 
 
 
 
 
 
 
Rocket Fuel