This is the third installation of a 3-part series on component-based authoring. Click here to read the first article, “How and Why to Use Component-Based Authoring: First in a 3-Part Series” and click here to read the second article, “How Component-Based Authoring Works: Second in a 3-Part Series.”
I will start this third and final part of my 3-part series on component-based authoring with a definition of the content model. Even though DITA (Darwin Information Typing Architecture) is already a defined specialization, the first step is still to determing how your content will be structured. DITA uses a very flexible content model in which many different kinds of topic structures can be defined. This is done through a powerful and unique DITA feature called specialization.
Consistent with the information typing aspect of DITA, specialization allows you to create your own variations of the generic topic structure, each of which becomes a different topic type. Three out-of-the-box specializations are included with the DITA standard: concept, reference and task topic types.
Most pre-DITA applications of XML have required relatively complex and restrictive content models. This has had the advantage of precisely controlling document structure and format, but with the disadvantage of making XML difficult to author and maintain. In contrast, DITA gives us a wide spectrum of choices as to how simple or complex our content models need to be.
Remember that DITA allows any number of stand-alone, reusable topics to be assembled into a map. The map defines the hierarchical structure for each output publication or deliverable-which may be quite complex-while allowing the topics themselves to have a relatively simple structure.
With this in mind, the first question to ask is whether there’s really a compelling reason to use anything other than the standard DITA specializations. In fact, for some applications, the question is whether we need to use anything other than the generic DITA topic itself. There is always a cost-especially in usability-when the content model is more complex than necessary.
In cases where specialization is needed, we recommend specializing directly from the standard concept, reference and task types. This is the best way to ensure future compatibility with both changes to the DITA specification and to the off-the-shelf tools that implement it.
If you can’t use the standard types as the basis for specialization, then we recommend staying as close to the standard types as possible. This will give you the greatest chance of staying consistent with vendor tools as the standard evolves-and make it much easier to switch back, so to speak, if you find that future versions of the standard specializations fit your needs.
Getting the Buy-In from Authors
Getting the buy-in from authors
It’s somewhat of a truism in this area that the authors themselves can be the hardest to convince. Thus, getting them bought in is also a critical first step. In our experience, authors do tend to love their familiar, unstructured authoring tools and find a move to component-based authoring somewhat confining. Therefore, it’s important to get them involved as early as possible, understanding the business reasons for the change and participating in related decisions. It’s also important to pick the right tools-with direct author input-that make this experience as easy as possible. It’s important to choose tools that are specifically designed to be user-friendly in component authoring applications.
Aside from unfamiliarity with new tools, authors find structured authoring disconcerting for two major reasons. First, it takes a mental shift to think in terms of reusable topics, when previously, authors personally controlled all the content in their documents. Second, it’s difficult to think in terms of format-neutral content, freely reusable across multiple channels, when previously, they precisely controlled the exact look and feel of the printed page.
While these issues can be difficult to overcome, we’ve found that two additional strategies can make a big difference. First, focus the authors on analyzing all the current overlaps across different documents, helping to find opportunities for reuse. Focus them on thinking about how much time they spend today trying to manually ensure this content stays consistent. It’s hard to overestimate the amount of pain many authors go through just to keep this kind of cut-and-paste reuse synchronized.
Second, build an early prototype that shows how cool it will be when the same single-source content can be repurposed across all the different output channels. Seeing their content suddenly come alive on the Web has cured many authors of their prior obsession with the printed page.
Migrating Existing Content
Migrating existing content
As with authoring new content, the most difficult part of converting legacy content is to make it topic-oriented. This includes the following three considerations:
1. Deciding what level of information should constitute a “topic” in the new system. – This should be done keeping in mind that a topic should have a specific subject and a specific purpose. For example, describing a single concept or a single, well-defined task.
2. Ensuring that each topic is self-contained. – This includes removing context-specific assumptions and references (for example, assuming you’ve just read the previous section of the book, or stating “see below”).
3. Ensuring that topics are reusable across multiple contexts. – This includes generalizing context-specific descriptors (for example, changing “replacement memory card” and “new memory card” to simply “memory card”).
Making one topic out of many
Where there’s opportunity for content reuse, the challenge is also to make one topic out of many. For example, the following variations might occur across four existing documents:
Variation No. 1: To install the widget, remove the screw on the right-hand side of the tray, slide the widget into the tray, and replace the screw to secure the widget.
Variation No. 2: You will need a standard Phillips screwdriver to install the widget. First, locate the tray and remove the screw. Then slide the widget in and replace the screw.
Variation No. 3: Locate the tray and remove the screw with a Phillips screwdriver. After sliding in the widget, replace the screw.
Variation No. 4: After locating the tray and removing the screw, slide in the new widget. When finished, replace the screw.
When legacy content is converted to DITA, all four of these versions will still exist. Ideally, authors will consolidate these into a single topic that can be reused across all three of the original publications. This can be done by picking the best, most reusable version, or by creating a new version that captures the best of each. In this example, perhaps the following:
New variation: Locate the tray and remove the screw with a Phillips screwdriver. Then slide the widget into the tray, and secure the widget by replacing the screw.
Finally, this new set of reusable topics must be linked back into a set of DITA maps that allow the output deliverables to be assembled and produced.
Of course, doing all this across your entire set of content can be a tremendous amount of work. Luckily, DITA doesn’t have to be an all-or-nothing approach. In practice, there is usually a “sweet spot” of content that’s really worth the effort, while other content can be used as is until there’s time and motivation to work on it. Content in the sweet spot typically is core material (as opposed to introductory or supplementary information), has the potential for significant reuse, changes frequently, and has significant cost or risk if it’s inaccurate or inconsistent.
Other content, even though it may not meet the strict definition for standalone and reusable topics, can still be broken up into “topics” and linked into DITA maps. However, such topics should not yet be marked as reusable. It’s also okay if we continue to have some redundancy across these lower-priority topics. We can keep multiple versions of topics and include them in different maps. Later, we can work to consolidate them and make them fully reusable as time permits.
Re-Designing Your Processes
Re-designing your processes
In the classic book-oriented world, each publication is sent out as a whole to reviewers, and then published as a whole once it’s been approved. This is straightforward, but typically results in multiple, redundant reviews of the same information.
With DITA, topics are written to be standalone and reusable, and information is only authored once. This means that they should be able to be reviewed only once, independent of any specific publication and use. But how does this work in practice? Does a new review cycle begin each time an individual topic is completed? The answer to this depends on three factors:
1. How the reviewers or subject matter experts are organized. – In a topic-oriented world, reviewers should focus on the set of topic for which they have expertise-regardless of the output deliverables in which they appear. Therefore, reviewers should get an extract of the topics in their specific area-not the whole output deliverable-usually once all topics in their area are complete.
2. How the output deliverables are organized. – In some applications, the core set of output deliverables are already arranged by area of subject matter expertise, even though there may be reuse beyond this core set. In this case, it would make sense to have the reviewers work directly on these core deliverables.
3. How often changes are made. – Normally it would be very inefficient to feed reviewers one topic at a time, and it might be difficult to have enough content to review. But for certain information that changes infrequently, such as legal boilerplate, it might in fact make sense to immediately put a single topic through the review cycle.
Typically, we recommend using separate DITA maps for review, organized to fit the needs of reviewers. These can be the entire publication, or a portion, if appropriate. They can also be just groupings of similar material for a particular subject matter expert, completely independent of a publication.
Leveraging DITA to Minimize Translation Costs
Leveraging DITA to minimize translation costs
In a multi-language environment, language translation can easily be the most expensive part of producing new and updated content. It’s also often at the root of updates coming out late. In general, two things drive the cost and complexity of translation. First, the number and types of languages being supported, and second, the amount of content-or more precisely, the number of unique sentences-to be translated.
DITA can’t change the first factor, but it can have a major impact on the second. To understand why, let’s look a little deeper at how translation works. In any modern translation process, special software called a translation memory examines all new content to see which sentences have already been translated. Only sentences that haven’t been encountered previously are sent for translation-and incur translation cost.
This means that not only unchanged content is ignored, but also any redundant content in the same document or across other documents in the set. Take the standardized phrase “Company XYZ makes no warranties and will not be held liable” as an example. Even though it may occur hundreds or thousands of times in the content, it will be translated only once.
The problem, though, is that supposedly redundant content is often not quite redundant, and even one word’s difference will cause sentences to be viewed as unique and, thus, separately translated.
In the previous example involving widgets and screwdrivers, there were originally eight unique sentences that had to be translated. After consolidation into a single, reusable topic, there were only two. That results in a 75 percent savings in translation costs!
Applying the Same Idea to Common Terminology
Applying the same idea to common terminology
The same idea can be applied to managing the standard terminology that’s included in content such as company names, product names and features, legal terms and so forth. By using DITA keyword elements to represent these terms, rather than “hard-coding” the actual text, changes to these terms can be done from a single place and automatically rippled through content. By mapping these DITA keywords to a translation terminology base, only the changed term needs to be re-translated-not all of the individual sentences. For example, suppose the term “widget” is used throughout our content, as in paragraphs such as this:
When upgrading the widget, first take the new widget and set it carefully to the side of the chassis. Then remove the current widget.
What if “widget” needs to be changed to “gizmo”? Changing one term results in the re-translation of all the sentences in which it occurs (and only two of thousands are shown above). This is further compounded if there are variations in these sentences across similar content (the problem described in the previous section). If all occurrences of the word “widget” are replaced with a standard term reference, however, then only the terminology base (and corresponding DITA keyword definition) needs to be changed:
When upgrading the <term1>, first take the new <term1> and set it carefully to the side of the chassis. Then remove the current <term1>.
On to success
We hope you’ve enjoyed this 3-part series on component-based authoring, and have found it useful in helping you to both understand and use this technology in your organization. Hopefully, this will also help you achieve the kinds of dramatic cost savings we’ve outlined in these articles. Good luck, and of course, don’t hesitate to ask for expert help if you need it!
This was the third installation of a 3-part series on component-based authoring. Click here to read the first article, “How and Why to Use Component-Based Authoring: First in a 3-Part Series” and click here to read the second article, “How Component-Based Authoring Works: Second in a 3-Part Series.”
Eric Severson is co-Founder and Chief Technology Officer for Flatirons Solutions Corporation. Eric is also on the board of directors for IDEAlliance and is a former president of OASIS–both XML industry consortiums. He can be reached at Eric.Severson@flatironssolutions.com.