Morass of Office data to be cleaned up through XML-formatted document formats.
Hear that giant sucking sound? Its all the IT data that employees create that gets laboriously created, checked and formatted, and then is never used again.
Word processors and spreadsheets are the top offenders here, and whole industries (such as search engine products and content management systems) have grown up to help improve data reuse rates of word processor and spreadsheet files.
One very simple reason that databases are so useful is they make data reuse easy. Databases require data to be carefully structured and checked before the data can be stored in the first place, so its immediately ready for use in another document.
The support that Microsoft is building into Office 11, the next release of Microsoft Office (expected to ship mid-2003) to save documents in XML format, promises to do something to remove the deservedly bad-boy reputation that desktop applications have as repositories for corporate data.
While the default file format for Office 11 documents will still be the proprietary Office data format, there will be a new option to save Office documents in XML format. (Suns OpenOffice has done much better in this area so far with its default and well-documented
XML format files.)
Office 11 will have its own default XML document structure, but a more promising option is the ability to save documents in an XML format of the customers choice. Using the industry-standard XML Schema
language, organizations will be able to define their own tags and then map them to existing or new Microsoft Word or Microsoft Excel files. When employees type data into the files and then save them, the file saved is compliant with the customers own defined schema.
Jean Paoli, a co-editor of the XML standard, is now part of the Office 11 development team and described
one usage scenario:
"Lets use the example of a corporate profile report to get a sense of how this works. If you create a corporate report in Word, you can save it in XML so that it can be read or indexed by any application, on any platform, via any device.
"But if you created the report using an XSD-enabled template, you have two options. You could either save it using the tags defined by the XSD or as a combined file with the tags defined by the XSD, interwoven with Words native XML tags, to preserve its appearance.
"With the second approach, you could send this profile to a securities regulatory organization, and people there would not only be able to view the document, but they could also build an XML Web service to extract the relevant information from the document and enter it into the their own database automatically, without need for user intervention and without having to rely on Word to open the file."
It sounds promising and well be examining betas of Office 11 over the coming months to see if the approach is simple enough to work without a lot of retraining or re-creation of existing documents. Something needs to be done in this space, however, and XML is the right technology for this job. West Coast Technical Director Timothy Dyck can be reached at firstname.lastname@example.org