How Component-Based Authoring Works: Second in a 3-Part Series

In the first installation of this 3-part series on component-based authoring, Knowledge Center contributor Eric Severson introduced component-based authoring and described its significant business benefits over document-based authoring. In this second part, Eric will explain how component-based authoring actually works, why DITA is the state-of-the-art XML standard of choice, and why a CMS is needed to keep track of all the components.

This is the second installation of a 3-part series on component-based authoring. Click here to read the first article, "How and Why to Use Component-Based Authoring: First in a 3-Part Series" and click here to read the third and final article, "Best Practices for Component-Based Authoring: Third in a 3-Part Series."


Component-based authoring involves breaking up information into smaller, reusable components which can then be flexibly recombined into a variety of output types and delivery channels. By avoiding the need to redundantly maintain information in multiple places, organizations have typically seen savings of 30 to 50 percent in authoring, review and production costs. Plus, they have seen up to 70 percent in language translation costs.

Component-based authoring has also paved the way for dynamic, personalized information delivery--combining exactly the right components to fit each individual user's needs. But how does component-based authoring actually work? Basically, there are four main pieces to this puzzle:

1. A new way of writing: thinking of your information in terms of reusable components rather than as a set of documents, books or Web pages.

2. Non-proprietary formats and tools: moving from proprietary word processing and desktop publishing formats into format and output-neutral.

3. Keeping track of all the components: using content management technology to store, maintain and keep track of the relationships between information components.

4. Flexible assembly and delivery: using state-of-the-art, XML-based publishing and delivery tools to drive mulitple output formats.

A new way of writing

For content to be flexible and agile, it can't continue to be managed as a set of separate books or manuals. Instead, it needs to be broken up into a set of smaller, fundamental building blocks. While this could be done in many ways, the most useful method is to break it up into topics.

To qualify as a topic, a piece of content should cover a specific subject, have an identifiable purpose, and be able to stand on its own (i.e., not require a specific context in order to make sense). Topics don't start with "as stated above" or end with "as further described below," and they don't implicitly refer to other information that isn't contained within them. The goal is for topics to be fully reusable, in the sense that they can be used in any context where the information provided by the topic is needed.

Because topics are standalone in addition to reusable, they can also be flexibly mixed and matched across a variety of publications. For example, if a standard product family description is maintained as a topic, it can be included, as is, across all publications for related products. Or, if many technical manuals contain essentially the same procedure, that procedure can be maintained as a single topic and reused, as is, wherever needed.

Non-proprietary formats and tools

To make them fully reusable, topics must be encoded in a format that is both output-neutral and media-independent. This requirement isn't met by Microsoft Word, Adobe FrameMaker or other classic desktop publishing tools. Instead, more powerful, non-proprietary authoring tools are required--tools such as JustSystems XMetaL that use XML as their source format. XML does not assume a specific authoring vendor or tool, a specific output format or a specific order in which information is assembled.

For component-based authoring, DITA (Darwin Information Typing Architecture) is the specific XML standard of choice. Originally developed at IBM, DITA was specifically developed for topic-based content and is now in its third year as an international standard.

DITA doesn't focus on documents, as did predecessor XML publishing standards. Instead, it is specifically designed around topics. These can be freely combined into documents, Web pages or any other assembly or collection, using the assembly instructions contained in a DITA map.

Within each topic, it's also possible to apply filtering criteria to individual elements. For example, two topics about installing a software module might be exactly the same--except for detailed differences between specific Unix and Linux commands. Rather than maintaining two parallel topics, DITA lets you put both types of commands in a single topic, marking each as applicable to either Unix or Linux.

Using a CMS to keep track of all the components

When information is split up into reusable components, change management becomes complex. This is especially true when components are shared across a large number of output types. Not only is it necessary to track all the changes but also to ensure that updated components will continue to make sense in all the contexts for which they're reused.

Meeting these needs requires a CMS (Content Management System). A CMS prevents components from being changed without proper permission, and explicitly controls the change and review workflow. The CMS also provides a "where-used" capability, which automatically tracks the linkages between components and all the outputs in which each component is used. This prevents reused components from being deleted (causing broken links) and gives the author a view into all the various contexts which will be affected by a change. Those who own these other contexts can also subscribe to component changes, and can be notified automatically each time an update is being proposed.

When choosing a CMS, it's very important to ensure that the CMS is tightly integrated with your XML authoring tool. This allows authors to perform key CMS functions from within the authoring tool, while letting the CMS control permissions, workflows and where-used relationships. From within the XMetal tool, for example, users can browse and search a CMS, check individual components in and out, review changes in multiple contexts, and participate in formal review and approval workflows.

Assembling and delivering final output

Using a set of DITA maps, reusable topics can be mixed and matched into virtually any combination of output documents, Web pages or other assemblies. By applying the proper XML-based style sheet to each map, XML source can also be transformed to virtually any output format or media.

Special assembly and deliver engines are used to process XML content. For example, an open source module called the DITA Open Toolkit is often used to process DITA-based content. In general, these modules perform specific steps:

1. Assemble all topics according to instructions in the DITA map.

2. Include any information linked into the body of each topic (for example, an error message description linked from a list of error messages).

3. Filter the content inside each topic based on profiling instructions (for example, Unix vs. Linux operating system).

4. Apply the appropriate XML style sheet to produce the desired output format.

In a static publishing scenario, DITA maps are created in advance for each pre-defined publication type (documents, Web pages, help systems and so forth). Each of the maps contains references to the topics that should be included for that particular publication.

DITA offers the flexibility to create a different map for each personalized scenario you wish to support. For example, rather than publish one book that covers multiple products, we could have a different DITA map for each.

But DITA can go even further than this. Since DITA maps are just XML files themselves, they can be automatically generated to support a fully personalized, dynamic publishing scenario. In this case, the dynamic map can be used in conjunction with a real-time XML query to pull the right topics from the repository. In fact, the XQuery standard, which is normally used for this purpose, can find relevant topics, filter or "profile" topics so that only applicable content is included. Plus, it can dynamically transform DITA XML into the desired output format (e.g., HTML and PDF)--all as part of a single, real-time process.

Taking the next step

Now that you have some understanding of component-based authoring and how it works, it's time to move on to the details of how you get started and what you need to do to ensure success. These kinds of practical advice and best practice examples are the subject of the next and final installment of this 3-part series. Stay tuned!

This was the second installation of a 3-part series on component-based authoring. Click here to read the first article, "How and Why to Use Component-Based Authoring: First in a 3-Part Series" and click here to read the third and final article, "Best Practices for Component-Based Authoring: Third in a 3-Part Series."

Eric Severson is co-Founder and Chief Technology Officer for Flatirons Solutions Corporation. Eric is also on the board of directors for IDEAlliance and is a former president of OASIS--both XML industry consortiums. He can be reached at