Translating XML Schema

eWeek Labs takes an in-depth look at what the W3C's new standard means for e-business.

Earlier this month at the Tenth International World Wide Web Conference in Hong Kong, XML took its biggest step forward since the document format was first standardized in February 1998.

At the conference, the World Wide Web Consortium released XML Schema as a W3C Recommendation, finalizing efforts that started in 1998 to define a standard way of describing Extensible Markup Language document structures and adding data types to XML data fields.

Now that it is finally out, the long-delayed XML Schema standard will catalyze the next big step in XML—allowing cross-organizational XML document exchange and verification.

Just as discovery of the Rosetta stone in 1799 provided a way to fix the meaning of Egyptian hieroglyphs so they could be understood across the gulf of two millennia, XML Schema provides a way for organizations to fix the meaning of XML documents so they can be understood across the gulf of organizational boundaries and otherwise incompatible IT architectures.

As a result, XML Schema will be a cornerstone in the new e-commerce architecture that we are collectively building and will be a vital component for making business exchanges and other loose associations of trading partners possible.

The arrival of XML Schema, more than three years after XML itself, has left many chafing at the bit (and others, such as Microsoft Corp., running off in their own direction implementing and shipping products based on prestandard efforts), and the market is now more than ready for this standard to take hold.

However, XML Schemas long development cycle gave vendors time to understand the specification and start writing compliant software, and we are now seeing the rapid release of XML Schema-compliant (or soon-to-be-compliant) authoring tools and servers.

A Little of Everything

That long, committee-driven development cycle also resulted in a specification that has a bit of everything in it, and fully compliant XML Schema parsers will have to be complex pieces of software to support all the options the specification allows.

Fortunately, XML Schema documents have to reference only the functionality they need, and the more complex options in XML Schema, such as null elements and explicit types, may just fade away through disuse.

The W3C recently published a recommendation on how to group Extensible HTML, the consortiums replacement for HTML, into well- defined subgroups so XHTML browsers (such as those in cellular phones) can clearly define which parts of the language they support and which they dont.

Something similar is a possibility for XML Schema if the full specification proves too difficult to implement for some vendors (although large players such as IBM, Microsoft and Oracle Corp. are moving ahead full speed with plans to support the full specification as published).

Over the next few years, eWeek Labs predicts XML Schema will become integral to the way that many companies exchange information.

XML Schema is clearly needed in todays e-business arena; it makes sense and is the logical next step forward for XML, the single most important enabling technology of business-to-business communication.

What XML Schema Does

The XML schema specification (online at consists of two parts.

Part 1 describes a language (the XML Schema Definition language) that is used to describe the high-level structure of an XML document.

Part 2 describes the list of allowable data types that can be used by the XML Schema Definition language (and thus in XML documents themselves).

Its very important for developers to understand that XML Schema documents are actually XML metainformation: They describe the structure of XML documents and dont contain end-user data themselves.

By using the XML Schema Definition associated with an XML data stream, an XML parser can automatically verify not just the syntax of the XML data but also its structure and logical correctness—a big step forward. XML Schema replaces the obscure and far less-powerful XML Document Type Definition standard.

For example, using XML Schema, companies can now detect if received XML files have missing data, data thats been improperly formatted (such as dates with only two-digit year values or fields with words entered where there should be numbers) or data thats obviously wrong (such as numbers that are clearly too large or too small to possibly be valid).

The range of characteristics that XML Schema defines is very comprehensive and includes a large selection of basic data types, such as integers, floating-point numbers, strings, times and dates; it also includes ways to constrain values to valid data ranges or to lists of valid values, the ability to define default values for missing data and the ability to make data elements required.

XML also defines complex types composed of groupings of simple types (such as an "address" type).

Regular expressions can be used to check for valid data, and XML Schema documents can inherit from and then partially override the behavior of other XML Schema documents for object-oriented development.

XML Schema does not provide a way to see whether values that look correct actually are correct (by checking against values in a database, for example), but vendors such as Data Junction Inc. are now starting to provide this capability.

Because it provides a way for organizations to share high-level definitions of how XML data should be structured, business exchanges are grabbing up XML Schema like theres no tomorrow.

Centralized repositories of XML file format information, such as Microsofts BizTalk (at, are now accepting submissions of industry-specific XML document definitions in XML Schema format.