Chapter 5. Filter Reference

1. Document Preprocessing
1.1. WordLink
2. Import Filters
2.1. (none)
2.2. RTF 1.6
2.3. Batch Processor
3. Export Filters
3.1. XML (upCast DTD)
3.2. XHTML 1.0 (strict)
3.3. XHTML 1.0 (transitional)
3.4. External CSS2
3.5. XML Validator
3.6. Commandline Processor
3.7. XSLT Processor
3.8. DocBook 4.2
3.9. XML (Raw)
3.10. Unicode Translator

1. Document Preprocessing

1.1. WordLink

If you're running upCast on Windows 95/98/2000/NT/XP and have a working installation of Word 97 or later, you can also convert Microsoft Word binary (*.doc) files directly. For this to work, you need to install some additional software included in the upCast distribution. To do this, choose the Extras Install WordLink command. After relaunching upCast, WordLink should be available and active. You can check this in the Help System Information… window.

This additional software lets upCast remote-control the installed copy of the Word application for invisibly and automatically converting the Word binary file to RTF, which then is processed as usual by upCast. Once installed, WordLink performs transparently in the background when needed.

You can control WordLink options from the RTF 1.6 import filter settings, WordLink tab.

Note

The WordLink feature is not available on Macintosh and Unix platforms. Therefore, when running on these platforms, the Extras Install WordLink command will always be disabled.

2. Import Filters

You set the active Import Filter by choosing it from the import settings filter list popup. It is immediately in effect.

After choosing an import filter, you may further customize its operation by setting parameters on it. For this purpose, an import filter may offer a configuration dialog which you can open by clicking the Configure… button next to the selection popup. The button will be disabled if the current filter does not have any configurable parameters.

upCast offers three different import filters: (none), RTF 1.6 and Batch Processor:

(none)

This filter does not do anything. It is mainly used for directly processing an input document with one of the export resp. post-processing filters, e.g. the XML Validation filter or the XSLT Processor. You will mostly use it in a batch processing or workflow environment.

RTF 1.6

This filter imports Word and RTF documents and converts them to a unified internal format suitable for applying export filters.

Batch Processor

This filter reads Batch Configuration Files (BCFs) which control complete upCast configurations and sequences of parameterized conversion jobs. It is probably most useful in a workflow situation or when you need to convert many documents in one step.

2.1. (none)

Though it appears like a filter, it actually defeats any import filtering. This is used when you want to pass-through the input document without any processing to specific output filters capable of handling such a situation, that is all filters which take file names for their input.

Some of them are: XSLT Processor, Commandline, XML Validator.

2.2. RTF 1.6

This import filter handles conversion from RTF to the internal, unified upCast format. With WordLink installed, the filter also can convert Word binary files (*.doc).

General. Sets general parameters:

Include original numbering info

If checked, numbering information (e.g. chapter numbering) as it occurs in the given document is preserved during the import process. This is useful in case you must ensure that chapter numbering remains exactly as it was in the original document or if you are exporting into the XHTML format in order to view a document in a browser.

Normally you would leave the numbering up to a style sheet processor or some other external XML processing engine.

OrigNumbering Boolean true | false

Embed object binary data (base64)

If checked, embeds binary object data (for restoring later using e.g. downCast) in binary, base64 encoded format into the XML as object element. Parameters are also converted to corresponding attributes on that element.

If this is unchecked, upCast tries to generate an image from the current visual representation of the embedded object and include that like an image.

ObjectHandling String embed | image

Default font size

Some RTF documents do not specify a default font size for their text, but rely on the default of the rendering application (like Microsoft Word). This parameter lets you set the default font size for such documents.

Note

Microsoft Word applications up to and including Word 97 used a default value of 10pt, Word 2000 and later use a default of 12pt. When you set this parameter to * (i.e. automatic), upCast tries to guess from the RTF symbols it finds in the document whether it is a Word 2000 (or later) document and then will use 12pt as default font size, 10pt otherwise.

DefaultFontSize String 1..99 | '*'

Images. Sets image handling parameters:

Include images

If checked, images will be retrieved from the given document and written to disk in the selected destination folder. Also, links to these files will be inserted into the document at the appropriate places (depending on export filter capabilities).

IncludeImages Boolean true | false

Use document base name

If checked, images will be named after the document's base name in the form basename-x.ext , where x is an image counter for that document and ext the extension corresponding to the image format.

DocbaseImageNaming Boolean true | false

Inline referenced images (if possible)

For images that have been included in the RTF document using both, reference and embedding, upCast tries to use the embedded substitute representation if this option is checked. This is important when images have been linked with a relative path so that after conversion to a different destination folder the link may break. This is even true with absolute paths when moving the result document to a different machine. This option essentially breaks the link to the original image file, if a substitute representation has been embedded in the RTF file, and instead links to an upCast-converted bitmap version of the original file.

When an image has only been linked and no substitute representation is available in the RTF, however, the original link to the image is preserved and used.

InlineReferencedImages Boolean true | false

Default image resolution

This parameter determines the image resolution in dpi (dots per inch) to use for embedded images that do not specify their resolution explicitly. This is true for all (originally) GIF images and some variants of JPEG and PNG images.

Without any dpi information, upCast (and, as matter of fact, even Word) cannot determine the absolute size of images, which is necessary to create a fully specified export file. This parameter is then used to establish a default dpi value and corresponds roughly to Word's Web OptionsImage resolution setting.

When setting this to the default '*' value, upCast determines the absolute size of the image from the image properties in the RTF document (if available) and modifies the embedded image data by adding the resolution determined from the (absolute size/number of pixels)-pair to the externalized image. This ensures that subsequent processors can correctly determine absolute sizes and scale any images accordingly.

Tip

If you have control over the original document generation process and especially image creation, make sure that each image you add to a Word or RTF document contains explicit resolution information, as this avoids all sorts of platform incompabilities.

This rule especially forbids importing GIF images as the GIF format does not support setting the resolution information. However, also several Clip Art images in JPEG and PNG format do not contain this desirable information, with displayed image size in a document becoming dependent on platform, Word version or setting of the Web OptionsImage resolution parameter.

DefaultPixmapResolution String 10 .. 9999 | '*'

Image rendering resolution

This value affects the WMF to pixmap renderer built into upCast. This means that WMF (or EMF) images will be rendered into a pixmap with pixel dimensions for width and height that correspond to this value.

The default value is 96 dpi (used e.g. by Microsoft's Internet Explorer™). You may want to change this when outputting for Netscape Navigator 4.7 on the Mac, which by default displays at 72 dpi and therefore would downscale images written using 96dpi resolution.

Suppose you have a WMF image in your document that is 2 by 1 inches in size. With 96 dpi output resolution, this will yield a pixmap of size 192 by 96 pixels.

However, if you set the output resolution to only 72 dpi, the resulting pixmap will be 144 by 72 pixels in size.

ImageRenderingResolution Integer 20 .. 360

Export embedded images of type…

upCast can convert several types of images commonly present in RTF documents in embedded form into other common formats. You can specify for each input format separately, which output format is desired. This leaves you with the option to not process certain images at all, but extract them without changes.

With its built-in, Java based WMF renderer, upCast even allows for converting most Windows Meta Files (WMFs) to any one of the supported pixel based output formats.[1]

You can convert each of the listed formats, i.e. WMF, EMF, JPEG, PNG and Macintosh PICT to any of the following formats:

(no change)

The embedded image is written to a file without any changes. Use this if you want to simply extract images in their native format.

*remove*

Discards the image completely. This means that you also won't get appropriate image elements in the output.

use WMF subst.

Note

Only available for EMF source.

Extended Meta File format images (EMFs) cannot be directly processed by upCast. However, you may use their WMF substitution which is always available in Word generated RTF. When you select this option, upCast uses the settings you have specified for WMFs and converts the image according to those settings. Think of this setting as a mere redirection to the WMF processing setting.

JPEG

Converts the source to JPEG. In the Options… dialog, you can specify the image quality in the range from 0 (lowest, smallest file) to 100 (maximum, largest file).

formatacronymDest§JPEG§Quality Integer 0 .. 100

Note

The quality setting is set separately for each of the source formats where you specify JPEG as the destination format.

PNG

Converts the source to PNG. In the Options… dialog, you can specify the compression type. You can choose among the following values:

default

This is the default compression algorithm, yielding a good balance between file size and image quality.

fast

Produces a PNG that loads fast, i.e. is small in file size.

max

Produces a PNG that tries to achieve highest possible image quality. File size will be bigger than normal.

none

Produces a PNG that does not apply any compression to the image material and therefore does not alter the original image.

formatacronymDest§PNG§CmprType String default | fast | max | none

Note

The compression type setting is set separately for each of the source formats where you specify PNG as the destination format.

BMP

Converts the source to Windows Bitmap (BMP) format.

PICT

Converts the source to Macintosh PICT format.

Note

The resulting file will only contain a bitmap. You cannot convert WMFs (being a vector based format) to a PICT containing QuickDraw operations.

Important

For PICT source format, upCast can only convert images which contain solely a bitmap object. upCast does not render QuickDraw operations like it does for e.g. WMFs.

Please note that this limitation causes most vector-based clip arts in PICT format to not be correctly converted using upCast. For best results, choose (no change) for embedded PICT source images to write them to a file and then post-process them with a third-party graphics application like Graphic Converter (Shareware) for Mac OS.

formatacronymDestFormat String unchanged | dispose | UseWMFSubstitute | JPEG | PNG | BMP | PICT

Advanced. Advanced settings:

Use paragraph outline level for structuring

If checked, the outline level attribute of a paragraph will determine whether it is considered a heading and therefore induces section nesting generation of suitable level.

If this parameter is off, only the pre-defined Word heading style names heading 1heading 9 will be used to determine section nesting.

RespectParOutlineLevel Boolean true | false

Empty headings always create <section> elements

upCast's default sectioning algorithm only creates a new section for the first of consecutive heading style elements if it is not empty. The idea is that the user may have created a heading, then hit return (not changing the style) to create visual space, and only then start with the actual content. You certainly would not want to have a section on its own for each of the visual space generating empty heading-styled paragraphs, but only for the first one, so section nesting generation is suppressed for the remaining heading-styled paragraphs.

If, however, you want to create section nesting corresponding to each heading-styled paragraph in a document, even if it's empty, check this option.

GroupEmptyHeadings Boolean true | false

Hoist common inline properties to parent

If checked, any inline formatting property that extends and has the same value over all children of a paragraph-level object will be hoisted to its parent object as a style override. Effectively we're making use of CSS inheritance and optimize the output by specifying that particular property only once on the parent instead of on each of its child elements.

HoistCommonInlines Boolean true | false

Remove empty inlines

If checked, any inline style specifications that do not contain any #PCDATA or similar, visually rendered content, are discarded from the document.

The default for this parameter is off based on the assumption that you may want to keep e.g. formatting information for empty cells so that a user may later fill in text and has the correct, originally intended formatting information available at that document location.

RemoveEmptyInlines Boolean true | false

Apply list structuring heuristics

If checked, special list structure detection algorithms are performed to create the best logcially structured XML output. If unchecked, Word's internal list IDs are used to track where a list starts and ends and where a new one begins, which may (based on the editing history of a particular list) not be what you are actually seeing in the layout.

The default value is on.

ApplyListHeuristics Boolean true | false

Use CSS for forced page breaks (where possible)

When checked, upCast uses the page-break-before: always; property definition on block level elements that have a page break right before them.

The default value is off, writing the empty pagebreak element for forced pagebreaks.

UseCSSForPagebreaks Boolean true | false

Use literal pass-through styles

If checked, you can specify two (Word-) styles, a paragraph style and a character style (by specifying their exact names) which should be treated as literals. This means that all text in the document set using these styles will be written to the output without any interpretation by upCast. This lets you write e.g. XHTML or XML code directly within your document the way it should appear at that location in the output.

For custom export filter, text picked up as literal will be delivered via the literal() callback method instead of characters().

Warning

This may create documents which are not well-formed if used incorrectly! Use this feature only if you know exactly what you are doing.

LiteralProcessing Boolean true | false

LiteralParStyle String parstylename

LiteralCharStyle String charstylename

Sectioner class

Here you can specify the fully qualified classname of a custom sectioning class. This class must be implemented by extending de.infinityloop.upcast.treeprocess.SectionProcessorBase and be available on the classpath.

The default implementation respects the RespectParOutlineLevel and GroupEmptyHeadings parameters described above. To use the default, built-in sectioning algorithm, specify * (an asterisk).

If you do not want any sectioning to take place, make sure the field is empty.

SectionProcessorClass String fully.qualified.classname | '*' | ''

Inline processor class

Here you can specify the fully qualified classname of a custom inline processor class. This class must be implemented by extending de.infinityloop.upcast.treeprocess.InlineProcessorBase and be available on the classpath. This class handles inline optimization.

The default implementation respects the HoistCommonInlines and RemoveEmptyInlines parameters described above. To use the default, built-in, optimizing inline processing algorithm, specify * (an asterisk).

If you do not want any inline processing to take place, make sure the field is empty.

InlineProcessorClass String fully.qualified.classname | '*' | ''

WordLink. Set WordLink features.

Note

Since WordLink is only available on the Windows platform, this tab will only be displayed when WordLink actually is available to the application.

Mode
Process .doc files only

When checked, WordLink and all options specified will only be applied on Word binary (*.doc) files.

Process all files

When checked, WordLink and all options specified will be applied on any input document, i.e. even files that are in RTF format already. This lets you automatically update fields or add pagebreak elements.

WordLinkMode String 'doc' | 'all'

Run macro 'il_premacro'

When checked, WordLink will first run a Word macro named il_premacro on the source document. This macro must either be defined in the respective document (when it is a Word binary .doc file) or in the global document template file (*.dot).

When this macro is not available, an error will be issued after conversion, though the further conversion process is not affected.

WordLinkCommand String '' | 'Premacro'

Update fields

When checked, WordLink will update any fields in the source document with current values: date, time, pages, …

WordLinkCommand String '' | 'Update'

Update from linked images

When including an image only by reference (i.e., using Word's INCLUDEPICTURE field), upCast is not able to determine the actual image size as that information is not part of RTF. By checking this option, the linked image is temporarily incuded into the document with the effect that image size and possibly applied scaling in the .doc Word binary file can be evaluated by upCast.

This feature is not beneficial for RTF source files, as in these the necessary information is already lost (also for Word).

WordLinkCommand String '' | 'Includelinkedimages'

Mark up layout pagebreaks using <pagestart />

This inserts a <pagestart /> empty inline element at those places where in current layout flow, there would be a dynamic page break when rendering the document.

WordLinkCommand String '' | 'Pages'

Mark up layout linebreaks using <linestart />

This inserts a <linestart /> empty inline element at those places where in current layout flow, there would be a dynamic line break when rendering the document.

Important

This is slow for documents bigger than about 100 pages. You may want to increase the Kill timeout value significantly. Also, some document structure constellations may yield wrong linebreak position results due to limitations in the Word application.

WordLinkCommand String '' | 'Lines'

Kill timeout

When hitting a corrupt document, WordLink may have problems and/or hang the application. Therefore, you can set a kill timeout value after which the WordLink functions will be aborted. The default value is 300 seconds.

Note

Killing WordLink may leave an invisible instance of Word running. Please check in case of a timeout running processes and kill any zombie Word processes manually using the Process Viewer (Ctrl-Alt-Del on Windows 2000/XP).

KillTimeout Integer timeoutseconds

Copy temporary .rtf file to output folder as "[basename]-tmp.rtf"

This is mainly for debugging purposes. It copies the intermediate RTF file to the output folder with a name of basename-tmp.rtf after having applied all WordLink functions. This is the file that upCast itself takes as source for its actual conversion process.

CopyToOutput Boolean true | false

Note

Operations are performed on the source document in the order the options are listed.

In API mode, however, you can also indicate the sequence in which operations are to be performed:

With a value for WordLinkCommand of 'UpdatePremacroPages', fields will be updated first, then il_premacro will be run, and then pagebreak markers will be added.

With a value for WordLinkCommand of 'PremacroUpdateLines', on the other hand, first the il_premacro will be run, then fields will be updated, and then only linebreaks will be marked-up.

Objects. Set handling of embedded objects (OLE). upCast generates an object element for each embedded object it finds in the RTF. The child elements of this container object are alternative representations of the object's data. This can can be an image (if available in the source document: represents the current display of that object at the time of saving the document), or an ole element (if available: it contains a base64 representation of the binary data of the OLE object, which makes it possible to reconstruct it to an editable instance using downCast).

Include image representation

When checked, an image representation alternative will be added to the object element (if available in the source document).

ObjectHandling String 'embed' | 'image' | 'embed image' | 'none'

Include inline binary data (base64)

When checked, an ole binary data representation alternative will be added to the object element. The ole element contains the base64 encoded binary data as character data.

ObjectHandling String 'embed' | 'image' | 'embed image' | 'none'

2.3. Batch Processor

The Batch Processor can be thought of a meta-import filter in that it itself calls either the (none) or RTF 1.6 import filter repeatedly. A Batch Job, that means the details on when to call which filters with which input documents, is controlled by so-called Batch Configuration Files (BCFs) that describe the batch job operations to be executed on source documents. They are interpreted and executed by the Batch Processor import filter.

Note

Note that the input file to be specified for the Batch Processor is the BCF file that defines the batch job, not the file(s) the batch is supposed to operate on.

2.3.1. Performing a batch job

In order to perform a batch job you click on Start Conversion. This initiates a detailed check of the chosen batch configuration file.

This check comprises the following:

  • A test for the existence of all specified documents to be converted. You will get a warning for all files that were not found.

  • A test for files that are already converted and therefore may be ignored (see skipexisting ). You will be informed about all the files that will be skipped.

At the end of the check you get a survey about the number of documents to be converted and their accumulated size.

If you want to proceed with the execution of the batch job, click Execute Batch, otherwise click Cancel Batch.

2.3.2. Aborting a batch job

A batch job consists of several phases, as determined in the batch configuration file. During the processing of these phases a progress bar will be displayed that provides a Cancel button. You can abort the batch job at any time by clicking this button. All already successfully written files will remain untouched.

3. Export Filters

This section describes the standard set of export filters upCast is distributed with. You can also write your own custom export filters; see Chapter 12, Export Filter API or download optional filter implementations (which may have their own parameter set and are not discussed at this place) from the Support section of infinity-loop's website .

3.1. XML (upCast DTD)

This is the standard, probably most important export filter within upCast. Any custom DTD will be targetted by using the document structure created by this filter as conversion basis.

Documents created by this export filter are valid against the upCast DTD (see Chapter 16, upCast DTD). This generic DTD is based on the implicit DTD any RTF document is made up of (i.e. essentially headings, lists, tables, paragraphs and inlines). The original style names are added as class attributes to the generic elements par and inline. This enables a constant DTD across all converted documents and facilitates generalized post-processing of the exported XML documents with tools like validators or XSLT processors.

The element tag names in the resulting document are therefore not based on the style names used in the source document, but taken from the upCast DTD or XML Schema, respectively.

This export filter has the following configuration options:

General. This group aggregates general export parameters.

Filter name

Lets you specify a distinct name for the instance of this filter so you can identify it easily in the list of active export filters.

FilterName String name

Remove empty elements

If checked, all elements which do not contain information (i.e. essentially empty paragraphs) are removed from the generated output. However, empty table cells are not removed, of course. This is a convenient option to strip empty paragraphs from the source documents that have been inserted by the author for layout reasons only.

DeleteEmpties Boolean true | false

Include layout information

When this is checked, style properties are added to the elements by adding the style attribute containing overriding CSS style information. Additionally, a stylesheet PI is automatically added to the document (depending on the setting of the Stylesheet PI parameter on the Advanced tab).

If this is off, only the class attribute is present on certain elements, and carries the original RTF style name.

IncludeVisual Boolean true | false

Include revision markup

When this is checked, document revisions are marked up in the result using the inserted and deleted elements.

If this is off, only the result of the revisions will be exported, i.e. inserted content remains in the document and deleted content is removed.

RevisionTracking Boolean true | false

Combine CLASS and STYLE attributes

When on, this option allows that both a class and style attribute may be present on an element. Otherwise, the two are separated and an anonymous inline element is used instead.

Option checked:

This is <inline class="slang" style="color: blue;">True Blue</inline>.

Option unchecked:

This is <inline class="slang"><inline style="color: blue;">True Blue</inline></inline>.

You might want to use this option to have named Word styles always separated out in a dedicated element so that additional override styles can be recognized quickly by the additional inline element.

CombineWithLogicalStyle Boolean true | false

Include Table of Contents as text

When this is checked, the generated text of a Table of Contents is preserved on export. Normally, you will probably want to have the Table of Contents regenerated at publishing time automatically so it is consistent with possible changes you have made to the XML document. Therefore, the default for this parameter is off.

WriteTOC Boolean true | false

Validate result

When checked, the output is automatically validated against the upCast DTD and any errors will be reported.

Note

Though this filter generates its XML based on the upCast DTD, this does not mean that any generated document is valid against this DTD. For example, the DTD requires that a table that has a thead element, must have also a tbody element. However, RTF allows for creating tables that exclusively consist of header rows, and such a document will therefore not validate successfully.

Validate Boolean true | false

Output resolution

This value will be used when calculating pixel-based attribute or property values when having to convert absolute lengths to pixel-relative ones.

The default is 96 dpi as per the CSS specification.

OutputResolution Integer 1 .. 9999

Output file extension

Specifies the extension of the output file. This will replace the original extension of the input file or, if there was none, will be simply appended to the given source file name. Output files will always be created in the destination folder.

The default extension is .xml.

Extension String .ext

Output file encoding

Lets you specify a name of a supported output file encoding, e.g. UTF-8 or iso-8859-1.

OutputEncoding String xml-encoding-name

XML. This group aggregates parameters special to the XML (upCast DTD) export filter.

Include <toc> element

When this is on, the position of the RTF generated table of contents field is marked with the toc element, and the generated contents at the time of writing the original document is included as its content.

WriteTOC Boolean true | false

Use upCast namespace (prefix: 'uc')

When checked, the elements of the upCast DTD are put into the namespace http://www.infinity-loop.de/DTD/upcast/4.0/, and the fixed namespace prefix uc is used on the elements.

UseNamespace Boolean true | false

Include invisible ('hidden') content

When checked, this includes document contents that is visually hidden in the original document. Such document contents will be marked-up by the element hidden and must be handled in a special way by post-processors. See also the technical discussion of the hidden element (Section 4, “The hidden element”).

IncludeHiddenContents Boolean true | false

Add inline CSS stylesheet

When checked, upCast will place the CSS stylesheet which is normally available by using the External CSS2 export filter into the XML document by way of the style element. The default value is off.

InlineStylesheet Boolean true | false

DTD type

Choose which type of document structure definition you want to use. Currently, only XML DTD is supported.

XML DTD

Selects the XML DTD version of the document structure definition.

DTDType String DTD

Table model

This parameter lets you choose which table model should be used for tables. You can either choose the HTML 4 table model, or the OASIS XML-EM (CALS) (OASIS XML Exchange Table Model, a subset of CALS) table model.

The HTML 4 table model uses the namespace prefix html for the HTML namespace http://www.w3.org/HTML/1998/html4.

TableModel String HTML | CALS

Advanced. This group aggregates parameters that are only relevant in special, advanced cases.

DOCTYPE declaration

Here, you can override the default DOCTYPE declaration (indicated by a simple '*' character) with a custom one. If you leave this field empty, no DOCTYPE declaration is written in the resulting file at all.

DOCTYPEDecl String '' | '*' | customDOCTYPE

Stylesheet PI

Lets you specify your own stylesheet inclusion or reference processing instruction. For the default one that references the stylesheet written by the External CSS2 export filter, use '*'. To suppress writing a stylesheet PI completely, leave this field empty.

Note

A stylesheet PI will only be written if the Include layout information parameter is checked, regardless of the setting of this parameter.

CustomStylesheetPI String '' | '*' | customStylesheetPI

Unicode translation map

upCast has a built-in mechanism for converting any Unicode character to any other Unicode character or even entity notation on export. This is done by means of the Unicode translation map, which is a plain ASCII text file. For a description of the format, see Section 3, “Unicode translation map”.

With this parameter, you decide whether you want to use any of the built-in maps or provide your own one by specifying its location in the file system. The default value is upcast:xml-map.

UnicodeTranslationMap String upcast:xml-map | upcast:html-map | customUnicodeMapURI

CSS property unit table

Here, you can specify a mapping table that associates any CSS <length> property with a pair unit, precision. When the filter needs to write length or size information in form of CSS properties, it consults this list to determine which length unit to use at which precision. For a description of the format, see Section 4, “CSS property unit table”.

With this parameter, you decide whether you want to use the built-in map or provide your own one by specifying its location in the file system. The default value is upcast:default-map.

CSSUnitMap String upcast:default-map | customUnitMapURI

3.2. XHTML 1.0 (strict)

This export filter creates a valid XHTML 1.0 (strict) document. It is intended to be used in conjunction with the External CSS2 export filter to include layout information of the original document. Footnotes are gathered and displayed at the end of the document, and an Index is also created from index entries found in the document. Splitting the document into several files is not part of this implementation.

Note

This export filter creates a valid XHTML 1.0 (strict) document. The implementation of this filter is based on the Export Filter API (see Chapter 12, Export Filter API) and the Java source code is available for download from infinity-loop's website.

This export filter has the following configuration options:

General. This group aggregates general export parameters.

Filter name

Lets you specify a distinct name for the instance of this filter so you can identify it easily in the list of active export filters.

FilterName String name

Remove empty elements

If checked, all elements which do not contain information (i.e. essentially empty paragraphs) are removed from the generated output. However, empty table cells are not removed, of course. This is a convenient option to strip empty paragraphs from the source documents that have been inserted by the author for layout reasons only.

DeleteEmpties Boolean true | false

Include layout information

When this is checked, style properties are added to the elements by adding the style attribute containing overriding CSS style information.

If this is off, only the class attribute is present on certain elements, and carries the original RTF style name.

IncludeVisual Boolean true | false

Include revision markup

When this is checked, document revisions are marked up in the result using the ins and del elements.

If this is off, only the result of the revisions will be exported, i.e. ins content remains in the document and del content is removed.

RevisionTracking Boolean true | false

Combine CLASS and STYLE attributes

When on, this option allows that both a class and style attribute may be present on an element. Otherwise, the two are separated and an anonymous span element is used instead.

Option checked:

This is <span class="slang" style="color: blue;">True Blue</span>.

Option unchecked:

This is <span class="slang"><span style="color: blue;">True Blue</span></span>.

You might want to use this option to have named Word styles always separated out in a dedicated element so that additional override styles can be recognized quickly by the additional span element.

CombineWithLogicalStyle Boolean true | false

Include Table of Contents as text

When this is checked, the generated text of a Table of Contents is preserved on export. Normally, you will probably want to have the Table of Contents regenerated at publishing time automatically so it is consistent with possible changes you have made to the XML document. Therefore, the default for this parameter is off.

WriteTOC Boolean true | false

Validate result

When checked, the output is automatically validated and any errors will be reported.

Note

Though this filter generates its XML based on the XHTML (strict) DTD, this does not mean that any generated document is valid against this DTD. For example, the DTD requires that a table that has a thead element, must have also a tbody element. However, RTF allows for creating tables that exclusively consist of header rows, and such a document will therefore not validate successfully.

Validate Boolean true | false

Output resolution

This value will be used when calculating pixel-based attribute or property values (e.g. on the img element) when having to convert absolute lengths to pixel-relative ones.

The default is 96 dpi as per the CSS specification.

OutputResolution Integer 1 .. 9999

Output file extension

Specifies the extension of the output file. This will replace the original extension of the input file or, if there was none, will be simply appended to the given source file name. Output files will always be created in the destination folder.

The default extension is .xml.

Extension String .ext

Output file encoding

Lets you specify a name of a supported output file encoding, e.g. UTF-8 or iso-8859-1.

OutputEncoding String xml-encoding-name

XHTML. This group aggregates parameters special to the XHTML export.

Add inline CSS stylesheet

When checked, a CSS stylesheet is placed directly into the XHTML document by using the style element. This makes the resulting document a self-contained unit inlcuding style information.

InlineStylesheet Boolean true | false

Allow empty table cell elements

Some browsers consolidate empty adjacent table cells into bigger areas. This might not be what you want, e.g. if the table is meant to provide empty fields where input is expected or the whole table is a time table. upCast therefore may fill empty cells with a non-breaking space entity (&nbsp;) to hinder browsers from consolidating empty cells. In this case the check box has to be deselected.

AllowEmptyCells Boolean true | false

Form elements

Determines handling of form elements found in the source document.

Discard

Discard any form elements completely.

Render as Text

Renders the current selection or choice of form elements as text into the document.

Create HTML Form

Creates corresponding form elements in HTML. The complete document body content is surrounded by a <form> element with empty action attribute. Use an XSLT post processing step to populate the action attribute with the desired value.

FormHandling String discard | text | form

External stylesheet

By default, a link to the CSS stylesheet as produced by the External CSS2 export filter is inserted into the XHTML document. To link the XHTML document to a different stylesheet, you can specify the destination here.

The default value is ${il:srcbasename}.css, with the parameter ${il:srcbasename} being resolved to the base file name of the original document that is currently processed.

CSSRef String path/to/styles.css

Advanced. This group aggregates parameters that are only relevant in special, advanced cases.

DOCTYPE declaration

Here, you can override the default DOCTYPE declaration (indicated by a simple '*' character) with a custom one. If you leave this field empty, no DOCTYPE declaration is written in the resulting file at all.

DOCTYPEDecl String '' | '*' | customDOCTYPE

Unicode translation map

upCast has a built-in mechanism for converting any Unicode character to any other Unicode character or even entity notation on export. This is done by means of the Unicode translation map, which is a plain ASCII text file. For a description of the format, see Section 3, “Unicode translation map”.

With this parameter, you decide whether you want to use any of the built-in maps or provide your own one by specifying its location in the file system. The default value is upcast:html-map.

UnicodeTranslationMap String upcast:xml-map | upcast:html-map | customUnicodeMapURI

CSS property unit table

Here, you can specify a mapping table that associates any CSS <length> property with a pair unit, precision. When the filter needs to write length or size information in form of CSS properties, it consults this list to determine which length unit to use at which precision. For a description of the format, see Section 4, “CSS property unit table”.

With this parameter, you decide whether you want to use the built-in map or provide your own one by specifying its location in the file system. The default value is upcast:default-map.

CSSUnitMap String upcast:default-map | customUnitMapURI

3.3. XHTML 1.0 (transitional)

This export filter creates a valid XHTML 1.0 (transitional) document. It is intended to be used in conjunction with the External CSS2 export filter to include layout information of the original document. Footnotes are gathered and displayed at the end of the document, and an Index is also created from index entries found in the document. Splitting the document into several files is not part of this implementation.

This export filter has the following configuration options:

General. This group aggregates general export parameters.

Filter name

Lets you specify a distinct name for the instance of this filter so you can identify it easily in the list of active export filters.

FilterName String name

Remove empty elements

If checked, all elements which do not contain information (i.e. essentially empty paragraphs) are removed from the generated output. However, empty table cells are not removed, of course. This is a convenient option to strip empty paragraphs from the source documents that have been inserted by the author for layout reasons only.

DeleteEmpties Boolean true | false

Include layout information

When this is checked, style properties are added to the elements by adding the style attribute containing overriding CSS style information.

If this is off, only the class attribute is present on certain elements, and carries the original RTF style name.

IncludeVisual Boolean true | false

Include revision markup

When this is checked, document revisions are marked up in the result using the ins and del elements.

If this is off, only the result of the revisions will be exported, i.e. ins content remains in the document and del content is removed.

RevisionTracking Boolean true | false

Combine CLASS and STYLE attributes

When on, this option allows that both a class and style attribute may be present on an element. Otherwise, the two are separated and an anonymous span element is used instead.

Option checked:

This is <span class="slang" style="color: blue;">True Blue</span>.

Option unchecked:

This is <span class="slang"><span style="color: blue;">True Blue</span></span>.

You might want to use this option to have named Word styles always separated out in a dedicated element so that additional override styles can be recognized quickly by the additional span element.

CombineWithLogicalStyle Boolean true | false

Include Table of Contents as text

When this is checked, the generated text of a Table of Contents is preserved on export. Normally, you will probably want to have the Table of Contents regenerated at publishing time automatically so it is consistent with possible changes you have made to the XML document. Therefore, the default for this parameter is off.

WriteTOC Boolean true | false

Validate result

When checked, the output is automatically validated and any errors will be reported.

Note

Though this filter generates its XML based on the XHTML (transitional) DTD, this does not mean that any generated document is valid against this DTD. For example, the DTD requires that a table that has a thead element, must have also a tbody element. However, RTF allows for creating tables that exclusively consist of header rows, and such a document will therefore not validate successfully.

Validate Boolean true | false

Output resolution

This value will be used when calculating pixel-based attribute or property values (e.g. on the img element) when having to convert absolute lengths to pixel-relative ones.

The default is 96 dpi as per the CSS specification.

OutputResolution Integer 1 .. 9999

Output file extension

Specifies the extension of the output file. This will replace the original extension of the input file or, if there was none, will be simply appended to the given source file name. Output files will always be created in the destination folder.

The default extension is .xml.

Extension String .ext

Output file encoding

Lets you specify a name of a supported output file encoding, e.g. UTF-8 or iso-8859-1.

OutputEncoding String xml-encoding-name

XHTML. This group aggregates parameters special to the XHTML export.

Add inline CSS stylesheet

When checked, a CSS stylesheet is placed directly into the XHTML document by using the style element. This makes the resulting document a self-contained unit inlcuding style information.

InlineStylesheet Boolean true | false

Allow empty table cell elements

Some browsers consolidate empty adjacent table cells into bigger areas. This might not be what you want, e.g. if the table is meant to provide empty fields where input is expected or the whole table is a time table. upCast therefore may fill empty cells with a non-breaking space entity (&nbsp;) to hinder browsers from consolidating empty cells. In this case the check box has to be deselected.

AllowEmptyCells Boolean true | false

Form elements

Determines handling of form elements found in the source document.

Discard

Discard any form elements completely.

Render as Text

Renders the current selection or choice of form elements as text into the document.

Create HTML Form

Creates corresponding form elements in HTML. The complete document body content is surrounded by a <form> element with empty action attribute. Use an XSLT post processing step to populate the action attribute with the desired value.

FormHandling String discard | text | form

External stylesheet

By default, a link to the CSS stylesheet as produced by the External CSS2 export filter is inserted into the XHTML document. To link the XHTML document to a different stylesheet, you can specify the destination here.

The default value is ${il:srcbasename}.css, with the parameter ${il:srcbasename} being resolved to the base file name of the original document that is currently processed.

CSSRef String path/to/styles.css

Advanced. This group aggregates parameters that are only relevant in special, advanced cases.

DOCTYPE declaration

Here, you can override the default DOCTYPE declaration (indicated by a simple '*' character) with a custom one. If you leave this field empty, no DOCTYPE declaration is written in the resulting file at all.

DOCTYPEDecl String '' | '*' | customDOCTYPE

Unicode translation map

upCast has a built-in mechanism for converting any Unicode character to any other Unicode character or even entity notation on export. This is done by means of the Unicode translation map, which is a plain ASCII text file. For a description of the format, see Section 3, “Unicode translation map”.

With this parameter, you decide whether you want to use any of the built-in maps or provide your own one by specifying its location in the file system. The default value is upcast:html-map.

UnicodeTranslationMap String upcast:xml-map | upcast:html-map | customUnicodeMapURI

CSS property unit table

Here, you can specify a mapping table that associates any CSS <length> property with a pair unit, precision. When the filter needs to write length or size information in form of CSS properties, it consults this list to determine which length unit to use at which precision. For a description of the format, see Section 4, “CSS property unit table”.

With this parameter, you decide whether you want to use the built-in map or provide your own one by specifying its location in the file system. The default value is upcast:default-map.

CSSUnitMap String upcast:default-map | customUnitMapURI

3.4. External CSS2

This export filter writes an external Cascading Style Sheets, level 2 (CSS2) file comprising all styles (paragraph styles and character styles) used in the given input file, matching their visual appearance as closely as reasonably possible. The output also includes information on the page setup like paper size and margins.

The CSS2 file written may be referenced by a file created by the XHTML 1.0 (strict) or XML (upCast DTD) export filter.

The extension of the output file is always .css and will replace the original extension of the input file or, if this is not there, will be simply appended to the given source file name. The output file will always be created in the destination folder.

General. This group aggregates general export parameters.

Filter name

Lets you specify a distinct name for the instance of this filter so you can identify it easily in the list of active export filters.

FilterName String name

Selector syntax

Lets you choose which CSS selector syntax should be used:

CSS1 ('class' shorthand)

Writes selectors using the 'class' attribute shorthand: .classname { ... }

CSS2 Selectors

Writes selectors according to CSS2 selector syntax rules: *[class=classname] = { ... }

CSS1+CSS2

Writes both ways of expressing the selector so that tools understanding either can pick the one that they understand. First, the shorthand is written, followed by full CSS2 selector.

SelectorSyntax String css1 | css2 | all

3.5. XML Validator

This is a post-processing filter.

This filter serves for validating arbitrary XML documents.

This export filter has the following configuration options:

Filter name

Lets you specify a name for the instance of this filter so you can identify it easily in the list of active export filters.

FilterName String name

Validate

The file to validate. You can use the following variables upCast provides:

${il:srcbasename}

gets replaced by the basic input file name.

If the current input file is /Data/Source/RTF/article.rtf, then ${il:srcbasename} will evaluate to article.

${il:srcfilename}

gets replaced by the full input file name.

If the current input file is /Data/Source/RTF/article.rtf, then ${il:srcfilename} will evaluate to /Data/Source/RTF/article.rtf.

${il:srcfolder}

gets replaced by the path to the input file, including path separator character at the end.

If the current input file is /Data/Source/RTF/article.rtf, then ${il:srcfolder} will evaluate to /Data/Source/RTF/.

${il:destfolder}

gets replaced by the current output folder, including path separator character at the end.

If the current output directory is /Data/Dest/html, then ${il:destfolder} will evaluate to /Data/Dest/html/.

${il:imgfolder}

gets replaced by the current image destination folder, including path separator character at the end.

If the current image destination folder is /Data/Dest/html/images, then ${il:imgfolder} will evaluate to /Data/Dest/html/images/.

${il:destbasename}

gets replaced by the expected destination file, excluding extension.

If the current input file is /Data/Source/RTF/article.rtf and the current output directory is /Data/Dest/html/, then ${il:destbasename} will evaluate to /Data/Dest/html/article .

InputFile String /path/to/file.xml

Validate only when grammar specified

When checked, the parser will validate the document only if a grammar is specified.

http://apache.org/xml/features/validation/dynamic Boolean true | false

XML Schema validation

When checked, validation against XML Schema is enabled.

http://apache.org/xml/features/validation/schema-full-checking Boolean true | false

Include external parameter entities

When checked, include external parameter entities and the external DTD subset.

http://xml.org/sax/features/external-parameter-entities Boolean true | false

Include external general entities

When checked, include external general entities.

http://xml.org/sax/features/external-general-entities Boolean true | false

Warn on duplicate attribute definition

When checked, a warning is issued when an attribute is defined more than once on an element.

http://apache.org/xml/features/validation/warn-on-duplicate-attdef Boolean true | false

Warn on duplicate entity declaration

When checked, a warning is issued when an entity is declared more than once.

http://apache.org/xml/features/warn-on-duplicate-entitydef Boolean true | false

Report invalid URIs

When checked, requires that a URI has to be provided where a URI is expected.

http://apache.org/xml/features/standard-uri-conformant Boolean true | false

3.6. Commandline Processor

This is a post-processing filter.

This filter serves for executing external system commands by way of the standard commandline interpreter available on the respective execution platform.

This filter has the following configuration options:

Filter name

Lets you specify a name for the instance of this filter so you can identify it easily in the list of active export filters.

FilterName String name

Commandline

The command to be executed by the underlying system command line interpreter. You can use the following variable substitutions provided by upCast:

${il:srcbasename}

gets replaced by the basic input file name.

If the current input file is /Data/Source/RTF/article.rtf, then ${il:srcbasename} will evaluate to article.

${il:srcfilename}

gets replaced by the full input file name.

If the current input file is /Data/Source/RTF/article.rtf, then ${il:srcfilename} will evaluate to /Data/Source/RTF/article.rtf.

${il:srcfolder}

gets replaced by the path to the input file, including path separator character at the end.

If the current input file is /Data/Source/RTF/article.rtf, then ${il:srcfolder} will evaluate to /Data/Source/RTF/.

${il:destfolder}

gets replaced by the current output folder, including path separator character at the end.

If the current output directory is /Data/Dest/html, then ${il:destfolder} will evaluate to /Data/Dest/html/.

${il:imgfolder}

gets replaced by the current image destination folder, including path separator character at the end.

If the current image destination folder is /Data/Dest/html/images, then ${il:imgfolder} will evaluate to /Data/Dest/html/images/.

${il:destbasename}

gets replaced by the expected destination file, excluding extension.

If the current input file is /Data/Source/RTF/article.rtf and the current output directory is /Data/Dest/html/, then ${il:destbasename} will evaluate to /Data/Dest/html/article .

InputFile String /path/to/file.xml

To create a new directory images in the current output directory on a Unix system, you would use the following commandline: mkdir %Oimages

Note

For the possible pitfalls of specifying command lines in upCast and quoting issues, please see the TechNote 1 we published on our site at http://www.infinity-loop.de/products/upcast/technotes/tn1.html.

Wait for command completion

When checked, the command is executed synchronously, i.e. upCast waits until the external command has completed before continuing execution.

Important

Checking for errors occurring during external command execution can only be performed when this option is on. upCast considers any return value other than 0 (zero) an error.

http://apache.org/xml/features/validation/dynamic Boolean true | false

3.7. XSLT Processor

This is a post-processing filter.

This filter offers the possibility to automatically apply an XSLT stylesheet to some result file(s) of earlier running export filters. It utilizes either the Xalan XSLT processor from the Apache Software Foundation (ASF; http://xml.apache.org/) or Saxon from Saxonica (http://www.saxonica.com).

Important

For the time being, we cannot recommend using J2SE version 1.4.x releases for use with upCast due to a very unfortunate packaging of an outdated XSLT processor containing several bugs in the JRE core runtime library rt.jar, which takes precedence over the newer, bugfixed releases of these tools upCast comes bundled with in its JAR.

The configuration dialog and the HelpSystem Information... window tell you which version of Xalan is actually being used for transformations performed by upCast.

This filter has the following configuration options:

Filter name

Lets you specify a name for the instance of this filter so you can identify it easily in the list of active export filters.

FilterName String name

Input file

Specify the absolute file name for the input file, i.e. the file an XSLT stylesheet is to be applied on. You can use the following variable substitutions provided by upCast:

${il:srcbasename}

gets replaced by the basic input file name.

If the current input file is /Data/Source/RTF/article.rtf, then ${il:srcbasename} will evaluate to article.

${il:srcfilename}

gets replaced by the full input file name.

If the current input file is /Data/Source/RTF/article.rtf, then ${il:srcfilename} will evaluate to /Data/Source/RTF/article.rtf.

${il:srcfolder}

gets replaced by the path to the input file, including path separator character at the end.

If the current input file is /Data/Source/RTF/article.rtf, then ${il:srcfolder} will evaluate to /Data/Source/RTF/.

${il:destfolder}

gets replaced by the current output folder, including path separator character at the end.

If the current output directory is /Data/Dest/html, then ${il:destfolder} will evaluate to /Data/Dest/html/.

${il:imgfolder}

gets replaced by the current image destination folder, including path separator character at the end.

If the current image destination folder is /Data/Dest/html/images, then ${il:imgfolder} will evaluate to /Data/Dest/html/images/.

${il:destbasename}

gets replaced by the expected destination file, excluding extension.

If the current input file is /Data/Source/RTF/article.rtf and the current output directory is /Data/Dest/html/, then ${il:destbasename} will evaluate to /Data/Dest/html/article .

InputFile String /path/to/file.xml

To create a new directory images in the current output directory on a Unix system, you would use the following commandline: mkdir %Oimages

XSLT file

Specify the absolute file name for the XSLT stylesheet to be applied to the Input File. You may also browse for the file by clicking the Choose… button.

You may use the same variable substitutions as described for Input File.

Stylesheet String /path/to/procsheet.xsl

Output file

Specify the absolute file name for the output file, that is the file where the result of the XSLT stylesheet application should be written to.

You may use the same variable substitutions as described for Input File.

OutputFile String /path/to/result.ext

Stylesheet parameters

Lets you specify parameters to be passed into the stylesheet. A parameter definition must follow this syntax:


                                paramname '=' '"' value '"'

Parameter definitions must be separated by at least one whitespace character. Quotes within the parameter value must themselves be quoted using the backslash character '\'.

You may use the same variable substitutions as described for Input File anywhere in the parameter defintion. Be careful that you need to quote the percent character '%' by writing %% and the backslash character '\' by writing \\.

XSLTParameters String ( paramname="value" )*

XSLT Processor

Lets you choose between Xalan and Saxon as the XSLT processor to use (if installed).

Note

Saxon 8 (supporting the current XSLT 2 draft specification) is only available on Java 1.4 or later.

XSLTProcessor String xalan | saxon

3.8. DocBook 4.2

This new export filter uses a combination of the Custom Export Filter API with an intermediate tree layer and a user-editable XSLT processing sheet db42.xsl to create DocBook 4.2 compatible output from RTF documents.

Note

The implementation of this filter is based on the Export Filter API (see Chapter 12, Export Filter API) and though the Java source code is not available for download from infinity-loop's website, the XSLT processing sheet db42.xsl is for customaization.

Full functionality of this export filter is described in this external document on our website.

This export filter has the following configuration options:

General. This group aggregates general export parameters.

Filter name

Lets you specify a distinct name for the instance of this filter so you can identify it easily in the list of active export filters.

FilterName String name

Remove empty elements

If checked, all elements which do not contain information (i.e. essentially empty paragraphs) are removed from the generated output. However, empty table cells are not removed, of course. This is a convenient option to strip empty paragraphs from the source documents that have been inserted by the author for layout reasons only.

DeleteEmpties Boolean true | false

Include layout information

When this is checked, style properties are added to the elements by adding the style attribute containing overriding CSS style information. This should be on for the DocBook export to work properly with the factory XSLT processing sheet.

If this is off, only the class attribute is present on certain elements, and carries the original RTF style name.

IncludeVisual Boolean true | false

Include revision markup

When this is checked, document revisions are marked up and the resulting DocBook document will be using in the result using the <phrase >...</phrase> and <phrase revisionflag="deleted">...</phrase> elements.

If this is off, only the result of the revisions will be exported, i.e. inserted content remains in the document and deleted content is removed.

RevisionTracking Boolean true | false

Combine CLASS and STYLE attributes

When on, this option allows that both a class and style attribute may be present on an element. Otherwise, the two are separated and an anonymous phrase element is used instead.

CombineWithLogicalStyle Boolean true | false

Include Table of Contents as text

When this is checked, the generated text of a Table of Contents is preserved on export. Normally, you will probably want to have the Table of Contents regenerated at publishing time automatically so it is consistent with possible changes you have made to the XML document. Therefore, the default for this parameter is off (false).

WriteTOC Boolean true | false

Validate result

When checked, the output is automatically validated against the DocBook 4.2 DTD and any errors will be reported.

Important

Note that the default installation of upCast does not include the DocBook 4.2 DTD, so validating a DocBook document will fetch the required DTD resources directly from the Internet. Validation will therefore fail if you are not connected to the Internet.

It is strongly recommended to use upCast's Catalog support and keep a local copy of the DTD on your machine both for reasons of speed and reducing remote server load. Please read here for more info on how to do this.

Validate Boolean true | false

Output resolution

This value will be used when calculating pixel-based attribute or property values when having to convert absolute lengths to pixel-relative ones.

The default is 96 dpi as per the CSS specification.

OutputResolution Integer 1 .. 9999

Output file extension

Specifies the extension of the output file. This will replace the original extension of the input file or, if there was none, will be simply appended to the given source file name. Output files will always be created in the destination folder.

The default extension is .xml.

Extension String .ext

Output file encoding

Lets you specify a name of a supported output file encoding, e.g. UTF-8 or iso-8859-1.

Important

This parameter is currently not supported in the exported document but determined by the output encoding specified in the db42.xsl processing sheet.

OutputEncoding String xml-encoding-name

DocBook 4.2. This group aggregates parameters special to the DocBook 4.2 export.

DocBook structure

Lets you choose between two predefined and one custom root element and document structure choices. This parameter is passed to the processing sheet as parameter rootElement with values book, article and custom.

The custom value is currently not used in the default processing sheet. You may use it to implement your own handling, since it is guaranteed to never be used by the default stylesheet.

DocBookRoot String book | article | custom

Processing sheet

By default, this points to the built-in processing sheet at jar:/de/infinityloop/upcast/resources/xslt/db42.xsl. To have the export filter use a customized one, specify its absolute URL here.

Stylesheet String '' | absolute_uri

Processing params

Parameters specified here are passed to the processing sheet. Use the following form:

param1="value1"
param2="some other value"

The following parameters are supported by the built-in conversion stylesheet:

inlines.graphical

When set to 'true', graphical inlines bold and italic are converted to <emphasis>...</emphasis> and <emphasis role="bold">...</emphasis> elements, respectively. The default is 'true'.

omitDefaultRole

When set to 'true', supresses writing role attributes when it is 'Normal' on <para>graphs and 'Default Paragraph Font' on <inline>s. The default is 'true'.

XSLTParameters String ( paramname="value" )*

Write raw source tree

When checked, the raw internal tree is serialized to fulldestfilename.dbraw. This is the tree the processing sheet is applied to.

Tip

Use this feature when customizing the processing sheet to check for available attributes on elements, their values and the general document structure of this intermediary tree.

RawTreeOutput Boolean true | false

3.9. XML (Raw)

This new export filter is a generalization of the DocBook 4.2 export filter. It lets you apply a custom XSLT processing sheet to a very rich, internal tree with resolved CSS properties as attribute-value pairs.

Important

This internal tree is not documented and subject to change between releases, though we will try to keep changes compatible and/or to a minimum. There is no DTD available for this internal, intermediary document representation, and we do not have plans on doing this. This allows us to quickly react to customer requirements and quickly implement new features for serving our customers.

This export filter has the following configuration options:

General. This group aggregates general export parameters.

Filter name

Lets you specify a distinct name for the instance of this filter so you can identify it easily in the list of active export filters.

FilterName String name

Remove empty elements

If checked, all elements which do not contain information (i.e. essentially empty paragraphs) are removed from the generated output. However, empty table cells are not removed, of course. This is a convenient option to strip empty paragraphs from the source documents that have been inserted by the author for layout reasons only.

DeleteEmpties Boolean true | false

Include layout information

When this is checked, style properties are added to the elements by adding the style attribute containing overriding CSS style information.

If this is off, only the class attribute is present on certain elements, and carries the original RTF style name.

IncludeVisual Boolean true | false

Include revision markup

When this is checked, document revisions are marked up using <inserted> and <deleted> elements.

If this is off, only the result of the revisions will be exported, i.e. inserted content remains in the document and deleted content is removed.

RevisionTracking Boolean true | false

Combine CLASS and STYLE attributes

When on, this option allows that both a class and style attribute may be present on an element. Otherwise, the two are separated and an anonymous inline element is used instead.

CombineWithLogicalStyle Boolean true | false

Include Table of Contents as text

When this is checked, the generated text of a Table of Contents is preserved on export. Normally, you will probably want to have the Table of Contents regenerated at publishing time automatically so it is consistent with possible changes you have made to the XML document. Therefore, the default for this parameter is off.

WriteTOC Boolean true | false

Validate result

When checked, the final output (i.e. after any XSLT transformations have been applied) is automatically validated and any errors will be reported.

Important

Note that the default installation of upCast does only include the upCast DTD.

It is strongly recommended to use upCast's Catalog support to keep a local copy of the destination DTD on your machine both for reasons of speed and reducing net traffic. Please read here for details on how to do this.

Validate Boolean true | false

Output resolution

This value will be used when calculating pixel-based attribute or property values when having to convert absolute lengths to pixel-relative ones.

The default is 96 dpi as per the CSS specification.

OutputResolution Integer 1 .. 9999

Output file extension

Specifies the extension of the output file. This will replace the original extension of the input file or, if there was none, will be simply appended to the given source file name. Output files will always be created in the destination folder.

The default extension is .xml.

Extension String .ext

Output file encoding

Lets you specify a name of a supported output file encoding, e.g. UTF-8 or iso-8859-1.

Important

This parameter is currently not supported in the exported document but determined by the output encoding specified in the custom XSLT processing sheet applied.

OutputEncoding String xml-encoding-name

XML. This group aggregates parameters special to the XML raw export.

Table model

This parameter lets you choose which table model should be used for tables. You can either choose the HTML 4 table model, or the OASIS XML-EM (CALS) (OASIS XML Exchange Table Model, a subset of CALS) table model.

The HTML 4 table model uses the namespace prefix html for the HTML namespace http://www.w3.org/HTML/1998/html4.

TableModel String HTML | CALS

Processing sheet

Specify the XSLT processing sheet to apply to ther internal, intermediary document tree to create the result document.

Tip

When this is left empty, the internal tree is serialized. This may help you in developing your custom processing sheet to see which attributes and elements are provided for a certain input document.

Stylesheet String '' | absolute_uri

Processing params

Parameters specified here are passed to the processing sheet. Use the following form:

param1="value1"
param2="some other value"

XSLTParameters String ( paramname="value" )*

XSLT Processor

Lets you choose between Xalan and Saxon as the XSLT processor to use (if installed).

Note

Saxon 8 (supporting the current XSLT 2 draft specification) is only available on Java 1.4 or later.

XSLTProcessor String xalan | saxon

Unicode translation map

upCast has a built-in mechanism for converting any Unicode character to any other Unicode character or even entity notation on export. This is done by means of the Unicode translation map, which is a plain ASCII text file. For a description of the format, see Section 3, “Unicode translation map”.

With this parameter, you decide whether you want to use any of the built-in maps or provide your own one by specifying its location in the file system. The default value is upcast:xml-map.

UnicodeTranslationMap String upcast:xml-map | upcast:html-map | customUnicodeMapURI

3.10. Unicode Translator

This export filter lets you apply a Unicode Translation Map to an already existing XML document. Additionally, by way of the Output file encoding parameter, you can quickly change the character encoding used in an XML file.

Though this filter tries to preserve the formatting of the original document while doing its thing, there is no guarantee that the result is syntactically equivalent to the input, though structurally, it of course is.

The Unicode Translation Map rules are only applied to the XML document's text and attribute nodes. Comments and PIs are left unchanged. A later revision of this filter, offered by way of an upCast update, will also let you select explicitly which nodes should be processed. Additionally, this filter will not perform substitutions in attribute nodes that contain the opening angle bracket '<' as this would render the resulting XML invalid.

This export filter has the following configuration options:

Unicode Translator. This only group lets you set the following parameters:

Filter name

Lets you specify a distinct name for the instance of this filter so you can identify it easily in the list of active export filters.

FilterName String name

Input file

Specify the absolute file name for the input file, i.e. the XML file the Unicode Translation Map should be applied to. You can use the following variable substitutions provided by upCast:

${il:srcbasename}

gets replaced by the basic input file name.

If the current input file is /Data/Source/RTF/article.rtf, then ${il:srcbasename} will evaluate to article.

${il:srcfilename}

gets replaced by the full input file name.

If the current input file is /Data/Source/RTF/article.rtf, then ${il:srcfilename} will evaluate to /Data/Source/RTF/article.rtf.

${il:srcfolder}

gets replaced by the path to the input file, including path separator character at the end.

If the current input file is /Data/Source/RTF/article.rtf, then ${il:srcfolder} will evaluate to /Data/Source/RTF/.

${il:destfolder}

gets replaced by the current output folder, including path separator character at the end.

If the current output directory is /Data/Dest/html, then ${il:destfolder} will evaluate to /Data/Dest/html/.

${il:imgfolder}

gets replaced by the current image destination folder, including path separator character at the end.

If the current image destination folder is /Data/Dest/html/images, then ${il:imgfolder} will evaluate to /Data/Dest/html/images/.

${il:destbasename}

gets replaced by the expected destination file, excluding extension.

If the current input file is /Data/Source/RTF/article.rtf and the current output directory is /Data/Dest/html/, then ${il:destbasename} will evaluate to /Data/Dest/html/article .

InputFile String /path/to/file.xml

Unicode Translation Map

Specify the absolute file name for the Unicode Translation Map to use for Input File. You may also browse for the file by clicking the Choose… button.

When you leave this field completely empty, no Unicode translation is performed. You can use this if the only thing you want is to change the character encoding the XML file is in by specifying the desired Output File Ending.

You may use the same variable substitutions as described for Input File.

UnicodeTranslationMap String /path/to/unicodetranslation.map

Output file

Specify the absolute file name for the output file, that is the file where the result of the Unicode translation should be written to.

You may use the same variable substitutions as described for Input File.

OutputFile String /path/to/result.xml

XML Version

Specify the value of the version attribute on the XML declaration at the beginning of the result XML file. If you leave this empty, no XML declaration will be written. The default value is "1.0".

XMLVersion String '' | xml-version

Output file encoding

Lets you specify a name of a supported output file encoding, e.g. UTF-8 or iso-8859-1. This encoding is also specified in the encoding attribute on the XML declaration (if written, see XML Version parameter above).

OutputEncoding String xml-encoding-name



[1] The built-in, fully Java based WMF renderer is not an industrial-strength image conversion tool. It is provided as a convenience for converting documents quickly for preview. Problems may arise when you use custom and/or symbol fonts as is the case in mathematical formulae. Also, the WMF format defines some opcodes which have no counterpart in imaging options available in the standard Java imaging API, so WMF files using these opcodes will not be rendered correctly.

If you require accurate conversion of WMF images into a bitmap representation, we highly recommend to have upCast write the WMF images out to disk without processing and use a specialized external third-party image processing tool for the rendering and/or converison.