Chapter 4. XML Document Contents

1. Grammar: The upCast DTD
2. Including a CSS Stylesheet
3. Parts, Sections, Headings and Paragraphs
3.1. Styling a part
3.2. Styling a heading
3.3. Hyperlinks
3.4. References
3.5. Page Headers and Footers
3.6. Tables
3.7. Lists
3.8. Tabulators
3.9. Table of Contents
3.10. Fields

1. Grammar: The upCast DTD

downCast accepts only documents for input which conform to the upCast DTD. Basically, this DTD consists of all structural and also layout possibilities that can be expressed in RTF. This means that when you cannot transform your custom DTD by some means to the upCast DTD, you won't be able to render your document contents in Word - no matter what. In this sense, the upCast DTD is the checkpoint for your XML documents.

What does this actually mean for you? There are two facts which are important:

  1. When you cannot find a way to convert your DTD to the upCast DTD, you won't be able to publish your document with Word. This is because - as said above - the upCast DTD reflects the implicit DTD that's behind the RTF specification and therefore the Word document model.

  2. When you can find a way to convert your DTD back and forth to the upCast DTD without loss of information, roundtrip editing of your XML document using Word is possible. For the conversion back from Word to XML, you may want to have a look at our companion product upCast, which generates XML files including CSS that follow the upCast DTD.

You can download a copy of the upCast DTD from http://www.infinity-loop.de/DTD/upcast/4.0/upcast.dtd, which contains inline documentation that should help you enough to create any necessary transformation processing sheets.

To have downCast automatically validate your document during import, you may want to include a DOCTYPE declaration at the top of your XML document:

 <DOCTYPE document PUBLIC 
     "-//infinity-loop//DTD upCast 4.0//EN"
     "http://www.infinity-loop.de/DTD/upcast/4.0/upcast.dtd">

2. Including a CSS Stylesheet

For downCast to be able to generate a RTF document, you must supply a stylesheet to downCast that follows CSS2 syntax. To do this, you should include a stylesheet Processing Instruction in your document before the root element:

 <?xml-stylesheet type="text/css" href="myStyleSheet.css"?>

downCast will then read the stylesheet myStyleSheet.css and use it for processing the XML source file.

3. Parts, Sections, Headings and Paragraphs

A document consists of parts which correspond to Word's sections. For each part you may specify a different page setup.

Within a part, you (may) have a nested structure of sections that structure your document in chapters, sections, subsections etc. The nesting level is restricted by RTF to nine levels.

Each section may be immediately followed by a heading element containing the heading for that section. A heading is just a paragraph with a special role.

Within a section, you place the actual contents, where the last block level element in the downward hierarchy is a par element indicating a paragraph of text. Within a par element, only inline level elements are allowed.

3.1. Styling a part

A part is an anchor element for re-setting page size properties. This is done by referring to named CSS @page rules using the page property in the styleattribute:

<part style="page: myPortraitA4;"> ... </part>

3.2. Styling a heading

To have a heading used in Word for outline display, you must set some specific properties. Word does not know the concept of a section element, but infers the document structure from the sequence and level of headings.

So, a heading must indicate the outline level it represents. This is done using the property -ilx-paragraph-outline-level with the level index as value, ranging from 0 to 9.

Furthermore, numbering a heading in RTF is done setting ordered list properties on that element. downCast uses the following properties for this, so you need to specify them in a style rule properly:

-ilx-marker-format
-ilx-list-group
-ilx-marker-align
-ilx-marker-follow
-ilx-marker-offset
list-style-type

The -ilx-list-group property groups nested headings into the same nested list description in RTF. This property is necessary since headings are treated as lists in Word, though they do not have list brackets in the XML. Make sure that headings that are part of the same sectionnesting structure all get the same, arbitrary -ilx-list-group integer value.

Follows a How-To for styling headings to take full advantage of Word's built-in heading handling:

  1. Make sure that the class name is heading x, where x is a number from 1 through 9, corresponding to the level of the section-heading element.

    Important

    Use a non-breaking space as replacement for the normal space, effectively giving a literal text of heading\00a0 4 to be used in the stylesheet selector text.

  2. Add the property -ilx-paragraph-outline-level to the style definition with an integer value corresponding to the section level of that heading, numbered 0 through 8 (i.e. one less than the heading?s class name).

  3. To all heading styles, add a property -ilx-list-group property, having the same, arbitrary integer value (e.g. 1). This groups the styles logically together (based on matching the provided integer value) so that numbering for a deeper level heading is automatically reset after a parent level has been seen.

  4. Add appropriate list-style-type and -ilx-marker-... properties to control the styling of the numbering of the section heading. (For a description of these properties, see above and Section 2, “Custom Properties”).

3.3. Hyperlinks

Hyperlinks, that is links to external URLs, are created using the link element, which takes its linking attributes and values from the XLink specification.

The only supported value for the xlink:type is simple.

The only supported value for the xlink:show is replace.

The only supported value for the xlink:actuate is onRequest.

A link to our web site could look like this:

<link xlink:type="simple" xlink:show="replace" xlink:actuate="onRequest" xlink:href="http://www.infinity-loop.de/">infinity-loop Web Site</link>

3.4. References

References are referrals of the reader to a different section of a document like the following: Section 3.8, “Tabulators”. Word allows several kinds of reference notices to the reader and targets. You may for example specify to have it only read "above" if the reader is referred to an earlier point in the document, or you may have it name the specific section you would like to refer the reader to, like "2.3".

This can be specified on a reference element using the property -ilx-reference-presentation-type. For a description of the available options, see Section 2.8, “-ilx-reference-presentation-type”.

A reference element might look like this in an XML document:

<reference xlink:show="other" xlink:actuate="onLoad" xlink:href="refTarget1" style="\-ilx-reference-presentation-type: number;" />

The target of the reference, refTarget1, must be indicated by an appropriate target element.

3.5. Page Headers and Footers

Page headers and footers must be marked up using the pageheader and pagefooter elements, respectively. Their contents can be considered a subdocument to the main document, put into container elements. The contents can be arbitrarily complex, so you may use lists and tables, paragraphs and images in a header and a footer. The contents is styled using the style rules present in the stylesheet as if it was a part of the main document.

3.6. Tables

downCast can process both, HTML tables and OASIS Exchange Table Model tables, which are a subset of CALS. This is in accordance with the upCast DTD which allows both table models. You need not configure downCast a priori to which table model you want to use, since downCast employs an intelligent detection mechanism. You may even combine CALS and HTML table model in the same document.

RTF does not allow explicit styling of rows, so CSS properties set on rows will not have any effect on the resulting document.

downCast does not support auto-calculation of table column widths as per the CSS2 specification, so you need to specify explicit widths for cells or entrys, respectively. A future version of downCast will address this temporary limitation.

3.7. Lists

downCast does not support the procedural counter() and counters() functions from CSS2. Styling of lists is done in a purely declarative way using so-called template strings for specifying the list marker format. The corresponding property is -ilx-marker-format. For its value, it takes a template string that may contain placeholders for the current list level number.

To format a list marker as 1. a), 1. b) and so on, you would define the marker format for the lists as follows:

<list style="list-style-type: decimal; \-ilx-marker-format: &quot;%0.&quot;">
  ...
  <list style="list-style-type: lower-alpha; \-ilx-marker-format: &quot;%0. %1)&quot;">
    ...
  </list>
</list>

In the marker format template strings, a character combination of %digit is replaced by the current numbering of the list at level digit. So, the first list will be numbered 1., 2. etc, and the nested list will first take the current numbering of the outmost list, followed by a period and a space, and then by the numbering if the first nested list (which is itself), followed by a closing bracket. The percent sign % is the escape character for the following digit (0 .. 8). If you want to have the percent sign be a part of the marker string itself, you need to quote it by doubling it: %% .

The outdent of the marker string, i.e. its position relative to the left border of the list item, is specified using the -ilx-marker-offset property.

List item component positioning

Figure 4.1. List item component positioning

You can additionally specify how the marker string should be positioned in relation to the outdent position: You may either left-align it with respect to the position specified by -ilx-marker-offset, or you may center it with respect to that position, or right-align it. This is accomplished using the -ilx-marker-align property.

3.8. Tabulators

Tabulators are evil.

Tabulators pose a wealth of problems in rendering systems like downCast due to their relative positioning nature to absolute positions, so we can not encourage their use. There are situations, however, where you will probably need to use them. This is why we included them as a feature in downCast after many controversial internal discussions.

You can specify a -ilx-tab-stops property on par and heading elements, whose value val takes the following form:

val ::= ( alignment ' ' leader ' ' pos ' ')+
alignment ::= 'left' | 'center' | 'right' | 'decimal' | 'bar'
leader ::= 'blank' | 'dotted' | 'middle-dotted' | 'lined' | 'dashed' | 'thick' | 'double-dashed'
pos ::= number unit
number ::= decimalNumber
unit ::= 'cm' | 'in' | 'mm' | 'tw'

This results in a list of tab positions where you can tab to using the tabulator character (U+0009).

3.9. Table of Contents

To specify the place in a document where a Table of Contents should be automatically generated by Word, use a toc element. Its data attribute lets you specify options as to the formatting Word should use in the same way as you specify field instructions. Please confer the Word online help for available and supported options.

This is a sample toc element you might place in your document:

<toc data="\o &quot;1-3&quot; \h \z" />

3.10. Fields

RTF fields are portions of a document whose contents is automatically generated by the application processing the RTF code. Field codes and instruction syntax are not part of the RTF specification, but are proprietary to the respective application. downCast supports several selected Word field instructions directly (hyperlinks, references, Table of Contents), but you can also insert arbitrary field types with appropriate instructions in the resulting RTF file. This is done by using the gentext element, short for: generated text.

Fields in RTF have two components, the field type and the field instruction. You specify both of these using attributes kind and data on a gentext element, respectively.

The following definition makes Word insert the total number of pages of the document at that position into the document body:

<gentext type="NUMPAGES" data="\* MERGEFORMAT" />