News Products Support Download Licenses Company

Home > Products > upCast > Support > DocBook Export

 

DocBook 4.2

Export Filter Documentation
& Details

Overview Features Samples Download Support

  DocBook on the Web

  DocBook 4.2 DTD

DocBook 4.2 Export Filter [Version ]

The DocBook export filter delivered with upCast 5.0 is based on a combination of the custom export filter API to create a DOM tree and a user-modifiable XSLT processing sheet.

NEW: Known issues section at the bottom of the page!

Downloads

The built-in processing sheet is available for download and tweaking here:

Download db42.xsl (90 KB)

Documentation for the Export filter (which also serves as sample input document) can be found here:

DocBook 4.2 Export Filter documentation (RTF original file, zipped, 90 KB)

Design decisions

We had to make some basic design decisions for this export filter, which we will outline below. However, you may override these by providing your own db42.xsl customization. We'd like to encourage you to send in any feedback you'd like to give. Please send your comments to <support@infinity-loop.de>.

Root element

The original XSLT transformation sheet did not honor Word sections, as its root element was <article>. The current implementation offers also <book> as document root and transforms Word sections to <part>s within this book.

Meta information

We have included some very basic heuristics to create a useful set of document meta information (DocBook element <bookinfo> resp. <articleinfo>) for a document. The problem is that the structures in DocBook are much more detailed (granular) than this kind of information is usually specified in the Word document, especially people's names. We'd be interested to learn how important this meta information is for you, so that we can allocate resources in this area as necessary.

Inline elements

DocBook has several specialized inline elements that can occur in a block-level element like <para>. This is reasonably straightforward when this inline element does not contain sub-structures. However, creating an element with a more complex sub-structure – as e.g. <personname> does – from a flat inline as Word delivers can be quite a complex issue requiring a specific set of heuristics. In short, we do not support structured inlined content in this release except for <personname>.

The export filter does, however, map plain inline styles to DocBook elements if their names match supported inline elements. The list of these elements is a parameter to the stylesheet, tagclasses.

Validation

The built-in filter supports inline validation of the created document and will report any errors. Please note that, depending on the structure of the source Word document, the filter is not guaranteed to deliver a valid result document.

Special Features

Revision Tracking: We use the generic inline element <phrase> in conjunction with the common attribute revisionflag to markup added or deleted content. There seems to be no way to attach author information to inline revisions markup in DocBook.

Nested Tables: This filter release does not support nested tables (<entrytbl>). Word, in contrast, does support table nesting up to 9 levels deep.

Important note: Currently, a nested table is completely discarded (!) from the output document, with an XML comment stating this fact at the corresponding location in the document.

We're currently investigating options, either the possibility to change the content model of <entry> appropriately or offering support for HTML tables.

Word Fields/Generated content: There is no dedicated support for certain Word field types in this release. The upCast DTD <gentext type="theType">...</gentext> element is converted to <phrase role="GEN_theType">...</phrase>. By changing a stylesheet parameter, export may also performed as Processing Instructions.

XSLT file

The filter relies on an XSLT processing sheet to do the main portion of its work. The transformation is carried out on a quite richly attributed, upCast-internal document tree (which is virtually identical to what the XML (Raw) export filter offers). There is no documentation available for that tree, but you can have the DocBook export filter serialize it to a file by checking the Write raw source tree option.

By default, the XSLT lives in the jar, and it is referenced using jar:/de/infinityloop/upcast/resources/xslt/db42.xsl. You may, however, specify any local XSLT file for running the transformation, e.g. a modified version of the built-in XSLT. You need to specify the file using a URL.

Some settings of the UI are passed to the stylesheet using parameters: encoding, revisionmarkup and rootElement. The latter is just a selector which is then handled appropriately throughout the stylesheet. You may pass additional parameters; have a look at the top of the stylesheet code for more info.

The default XSLT processing sheet can be downloaded from here:

Download db42.xsl (90 KB)

Limitations

Output is always performed in UTF-8, disregarding the setting in the UI. This is due to the fact that <xsl:output>'s encoding attribute does not take a stylesheet parameter as its value.

Due to the almost unbounded freedom of creating RTF documents on the one hand and the lack of deeperly nesting hierarchies on the other hand, this implementation is facing several challenges and limitations. We have built-in some heuristics already (e.g. <caption> detection for images, <personname> parsing), and will continue to do so in future revisions. However, we cannot guarantee that we generate valid DocBook from any RTF source. Our tests showed that validation failed for documents that are structured incompatibly with the DocBook DTD, and in our view it is a desired effect that the result fails validation: It lets you spot any structural errors in the original document quickly instead of having glossed over it.

Without doubt, the provided XSLT will not be suitable to everyone. Feel free to use it as a template and adapt it to your own needs, or even start over from scratch. In doing so, if you encounter any missing information in the raw source (on which the XSLT is applied) that is required to accomplish what you need, please let us know!

Feedback

We're very interested in your comments, suggestions and bug reports. This export filter is our main focus right now, so we should be able to react in a timely manner to your input. Thank you!

Known issues as of Sun, 2004-03-28


© 2003-2007 infinity-loop GmbHSend mail to WebmasterAnbieterkennzeichnung §6 TDG