Chapter 9. upCast API Reference

1. Concepts
2. Using the API
3. General programming steps
3.1. Setup
3.2. The UpcastEngine object
3.3. Importing a document
3.4. Exporting a document
3.5. Setting global parameters
3.6. Error Handling
3.7. Cleanup
4. Method reference
5. Parameter reference
5.1. Global parameters
5.2. Import Filter parameters
5.3. Export Filter parameters

1. Concepts

Accessing upCast functionality is carried out via one instance of a broker object: UpcastEngine. You should create one instance of that object at startup and use it for many subsequent conversions, since creation of this object is rather expensive. There are no problems in reusing that object for subsequent conversions (in contrast e.g. to many XML parser implementations, for example) - to the contrary, it is highly recommended from a performance point of view.

You may create several instances of the UpcastEngine object in order to run multiple conversion threads at the same time in your single application. Please note that the number of parallel threads may be restricted by your license.

2. Using the API

We assume that you are familiar with Java programming and its concepts like objects, interfaces and implementations. You should also be fluent with the Java object notion and with Java Streams.

This section about using the upCast application programming interface is divided into two parts: the first part shows you hands-on how to get documents converted programmatically using upCast, the second is intended as a reference once you have a good grasp of the workings.

3. General programming steps

Converting a document with upCast is divided into the following, easy steps:

  1. Register the JAR with a license file.

  2. Instantiate an UpcastEngine object.

  3. Set the required global parameters.

  4. Set the import filter to be used.

  5. Set import filter parameters, if required.

  6. Import the document, generating an internal, unified representation.

  7. Specify an export filter.

  8. Set export filter parameters, if required.

  9. Export the document.

  10. (optional) Repeat from step 7, if you want to export the same document to several different formats .

  11. (optional) Repeat from step 4 to convert another document.

If we look at these steps in actual Java code, the above would translate into (line numbers and steps corresponding):

UpcastEngine.setLicense( new FileInputStream("/path/to/upcast.license") );

UpcastEngine uc = new UpcastEngine("instanceOne");
uc.setGlobalParameter( "outputDir", "/Dev/RTF/html/" );

String filterID = uc.setImportFilterType( UpcastEngine.kRTFImportType );

uc.setImportFilterParameter( filterID, "IncludeImages", new Boolean( true ) );
uc.importFile( "/Dev/RTF/test.rtf" );
filterID = uc.setExportFilterType( UpcastEngine.kXHTMLExportType );
uc.setExportFilterParameter( filterID, "AllowEmptyCells", new Boolean( false ) ); 
uc.exportFile( "/Dev/RTF/html/test.html" ); 

Let's have a look at the single steps in more detail.

3.1. Setup

3.1.1. Setting the License

The first thing you need to do (before any conversion can take place) is to register the application with an appropriate license. This is done via the static method

public static boolean setLicense( InputStream inStream )

It takes an InputStream of the XML based license file.

If you package e.g. your license file into your application JAR at root level, you can use the following line to register your application:

boolean isRegistered = UpcastEngine.setLicense(
                          UpcastEngine.class.getResourceAsStream( "upcast.license") );

If you package the license file at the default location {JAR Resource Location Path}/licenses/upcast.license, you need not call the setLicense() method at all.

3.1.2. Connecting to WordLink (Windows only)

To access WordLink functionality also from upCast running with API, you need to tell it where the WordLink component il-gw.exe is to be found before you instantiate an UpcastEngine object. This is done by setting the System Property de.infinityloop.exe.location to the directory where il-gw.exe resides:

System.setProperty( "de.infinityloop.exe.location", "/path/to/il-gw-directory/" );

On a typical Windows installation, this is C:\Documents and Settings\<accountname>\infinity-loop\upCast\Application Support\EXEs , but you are free to move the application file il-gw.exe anwhere in your filesystem where it is convenient for your deployed application.

Important

Using WordLink in a critical server-based unattended environment is not supported and therefore not recommended. WordLink uses an installed copy of Word in component mode. Such use is explicitly warned against by Microsoft for server or server-like applications for technical reasons (letting alone any remaining licensing issues).

3.2. The UpcastEngine object

You gain access to all functionality of upCast by means of objects of a single class: UpcastEngine. An instance of this object is what you will use in your application in order to access the full range of upCast API functionality.

Before you can do anything (after having registered the API with a license) with upCast, you need to instantiate a UpcastEngineobject:

UpcastEngine ucInst = new UpcastEngine( "instanceOne" );

The UpcastEngine class is to be found in the de.infinityloop.upcast package.

You should keep this object stored in a variable which you can access from all places inside your program where you need to access upCast functionality.

You should strive to have only one instance of the UpcastEngine object per physical CPU at any time for performance reasons. Also make sure you only instantiate this object once during the life of your application process, as instantiating and disposing of this object is a relatively costly operation.

3.3. Importing a document

Document conversion within upCast is a two-step process: You first import a document, and then export it. The separation of these two processes may seem unnecessarily complicated here, but this is what offers you the possibility to export one source document into several destination formats without re-reading the source document for every single conversion. Since importing an RTF document is a complex and calculation intensive task, this leads to a tremendous speed advantage.

Importing a document is straightforward: Choose the import filter, set its parameters, and initiate reading the document.

3.3.1. Setting the import filter

In order to import or read a document (which implies that we understand the language of the document), upCast needs to know in which format a document is stored. Setting the import filter is like telling it.

Currently, only the RTF 1.6 import filter is supported, which reads documents conforming to the Rich Text Format (version 1.6) specification as set forth by Microsoft.

Note

If you are running upCast on Windows 95/98/2000/NT/XP and have Microsoft Word 97 or later installed on the same machine, you may also convert Microsoft Word binary (*.doc) files directly by using the WordLink feature.

Please be aware that this feature only works for converting files that are already physically on disk, not for data coming in through streams.

To be open for future upCast development, however, it is needed that you explicitly set the import filter. This is done via the call

String filterID = ucInst.setImportFilterType( UpcastEngine.kRTFImportType);

, with ucInst being an UpcastEngine instance.

The method returns a java.lang.String object which is used to identify this specific filter(-instance) for setting parameters using the setImportFilterParameter() method.

The constant UpcastEngine.kRTFImportType indicates the RTF 1.6 import filter and is the only valid constant to be used as of now.

Setting an import filter discards any previously imported document and sets useful default parameters for the specific filter.

3.3.2. Setting import filter parameters

Once you have set an import filter, you can set parameters for that filter. To identify the filter, you use the return value you got from setImportFilterType():

ucInst.setImportFilterParameter( filterID, "IncludeImages", new Boolean( true ) );

This call sets the parameter IncludeImages to true for the filter indicated by the string filterID.

Parameter names for each filter are given in the description of the filters. The parameter value has to be passed as a Java Object. The required object class depends on the specific parameter and is documented for each available parameter.

If you set a parameter more than once, the last value set will be used.

To set several parameters, you need to repeatedly call the method setImportFilterParameter().

You must make sure that you use the filterID you got when you called setImportFilterType() for the last time. Otherwise, the parameter will have no effect.

Note

If you try to set a parameter that is not supported by the current filter, the parameter simply will have no effect, but no error is reported. To track which parameters you set in your application, you should turn on debug logging.

Important

If you use a different Java Object (sub-)class for the parameter value than specified in the reference section, the behaviour is undefined. Some types may be compatible, but in general you will get a Java exception at some point later in the execution of upCast or the conversion will not work the way you intended.

3.3.3. Initiate the actual import

Once you have set the import filter type and appropriate parameters, you need to tell upCast from where to import the RTF document. This is done using either the method importStream() for importing the data from a java.io.InputStream or using the method importFile()for importing the data from a named file, respectively.

Note

You must specify an absolute path for srcFile.

Any errors during the import phase will be reported as Exceptions of the respective type.

After the import, the document will be held in memory in a proprietary, internal format ready for (also multiple) export operations into different output formats.

The importStream() function is very flexible due to the various InputStream implementations that come standard with the JDK. You can e.g. turn the String myRTFString into an InputStream using a construction like

ucInst.importStream( new ByteArrayInputStream( myRTFString.getBytes() ) );

An imported source document replaces any previously read source document in memory. Only one document can be held in internal format at any time per UpcastEngine instance.

3.4. Exporting a document

Once a document has been imported as explained, we are ready for setting up the export.

Exporting a document is straightforward and very similar to importing a document: Choose the export filter, set up its parameters, and initiate exporting the document.

3.4.1. Choosing the export filter

For a document to be exported, upCast must be told into which format the export should happen. This is done by specifying one of the built-in export filters or, if you're using a custom filter, your custom filter implementation class name less the package prefix. The built-in filters are the same as found in the GUI of upCast.

Setting the export filter is done via a statement like

String filterID = ucInst.setExportFilterType(
                            filterType );

, where ucInst is the UpcastEngine instance and filterType is one of the following constants as defined in UpcastEngine:

kCommandlineExportType

the Commandline processing filter

kCSSExportType

the External CSS2 export filter

kXHTMLExportType

the XHTML 1.0 (strict) export filter

kXMLExportType

the XML (upCast DTD) export filter

kXSLTProcessorExportType

the XSLT Processor processing filter

kXMLValidatorExportType

the XML Validator processing filter

kXMLDocBookExportType

the DocBook 4.2 processing filter

kXMLRawExportType

the XML Raw export filter

kRawTreeDumperExportType

the Raw Tree Dumper export filter

kUnicodeTranslatorExportType

the Unicode Translator export filter

The method setExportFilterType() returns a java.lang.String object which is used to identify this specific filter(-instance) for setting parameters using the method setExportFilterParameter().

Setting an export filter replaces any previously set export filter and automatically sets useful default parameters for the specific export filter.

3.4.2. Setting export filter parameters

This works the same as for import filter parameters. To identify the filter, you use the return value you got from the method setExportFilterType():

setExportFilterParameter( filterID, "AllowEmptyCells", new Boolean(true) );

This call sets the parameter AllowEmptyCells to true for the filter indicated by the string filterID.

Available parameter names for each export filter are listed in the export filter reference section. The parameter values are passed as Java objects. The required object class depends on the specific parameter and is documented for each of the available parameters.

If you set a parameter more than once, the last value set will be used.

To set several parameters, you need to repeatedly call setExportFilterParameter() for each parameter you wish to set.

You must make sure that you use the filterID you got when you called setExportFilterType(). Otherwise, the parameter will have no effect.

Note

If you try to set a parameter that is not supported by the current filter, the parameter simply will have no effect, but no error is reported. To track which parameters you set in your application, you should use the built-in debugging and logging features.

Important

If you use a different Java Object class for the parameter value than specified in the reference section, the behaviour is undefined. Some types may be compatible, but in general you will get a Java Exception at some point later in the execution of upCast or the conversion will not work the way you intended.

3.4.3. Initiate the export

Once you have set the export filter type and appropriate parameters, you need to tell the upCast where the internally held document should be exported to. This is done either using the exportStream() method for exporting the data to a java.io.OutputStream or using exportFile( destFile ) for exporting the data to the named file destFile, respectively.

Important

You must specify an absolute path for destFile.

Any errors during export will be reported as Exceptions of the respective type.

After an export operation, the source document will still be available unchanged in its internal representation, ready to be exported using e.g. the same filter with different parameter settings or even a different export filter.

As with importStream(), the exportStream() function is very flexible thanks to the various OutputStream sub-classes that come standard with the JDK.

3.5. Setting global parameters

For certain conversions, it is necessary to set parameters that can and will be accessed by both filter types, import and export filters. To set these types of parameters, you use the method

setGlobalParameter(String parameter, Object value)

This works the same as setImportFilterParameter() or setExportFilterParameter(), it just lacks the filter identification string parameter.

Make sure you set a global parameter before importing the chosen document, otherwise it would have no effect on the import filter.

Important

Prior to any conversion, you must set the outputDir parameter to a valid value!

3.6. Error Handling

During a single call to an API method, several problems may occur, some of them quite significant, some of them less significant. In every case, the method will throw a single ILException (short for infinity-loop Exception). An ILException is a special descendant of a java.lang.Exception that encapsulates a list of errors and/or warnings that occurred during the last call to an API method.

You can query an ILException for its single constituents, which are objects of type LogEntry. A LogEntry encompasses:

  • a numerical message code

  • a message class, one of: FATAL, ERROR, WARN, INFO, DEBUG

  • a human readable message as String

  • a (possibly null) array of parameters that were used in constructing the message

3.6.1. Coding pattern

The recommended coding style for error handling is to wrap each call to an API method in its own try{}/catch{}-block and catching ILExceptions explicitly. This is useful if e.g. the importFile() call throws an exception, but the severity is not high and you decide to continue processing because it only contains a warning that you do not care about and that does not affect the document integrity. By wrapping each call separately, you get the maximum out of any sequence of API calls by just skipping the portions that did not work.

A typical API call including error handling would look something like:

try {
  exportFile( filename );
} catch( ILException e ) {
  if( e.extractSignificantEntries( 
      new int[] { 
        NotificationCollector.FATAL, 
        NotificationCollector.ERROR 
      }, 
      null, 
      null ).size() > 0 ) { // we only react on FATAL or ERROR types, but not WARNings
    
    ... do some error handling ...;
  }
}

3.6.2. Tidbits

Using the extractSignificantEntries()method you can specify in very high detail in what messages you are interested in. For more information on how to use this method, see the javadoc API reference.

The message codes are all constants of a special class, Msg. See the javadoc API reference for a description of the currently possible message codes and the number and semantics of parameters available for a specific message.

3.7. Cleanup

As has already been said, an imported document is kept in internal memory until it is replaced by a different source document or a different import filter is set. If you are finished with all exports of an imported document and want to reclaim the memory occupied by it, you can do so in several ways:

  • Call setImportFilterType() once.

  • Explicitly discard the UpcastEngine instance by making sure that you no longer hold a reference to it. Normally, this is done by setting the variable you used to hold the object reference to null. Note that this procedure may not work reliably and therefore is deprecated.

The next run of the garbage collector will free the previously occupied memory.

4. Method reference

The API reference is provided in form of javadoc HTML documentation. This assures that the documentation is up-to-date since it is generated directly from the underlying source code. The most recent version of the upCast API documentation is available from our website at http://www.infinity-loop.de/support/documentation/api/.

5. Parameter reference

5.1. Global parameters

This section describes parameters that can be set via the setGlobalParameter() method.

Destination Directory

Sets the destination directory for conversion results. This parameter also determines any relative path calculation.

outputDir String /path/to/destination/

Image Directory

Sets the destination directory for images extracted during document import.

imageDir String /path/to/imagedestination/

Delete temporary files

For debugging purposes, it might be convenient to keep the intermediary .doc to .rtf conversion result, which will reside in the {Log Folder}.

Note

If you set this to false, remember to occasionally delete the created temporary RTF files from the respective folder.

DeleteTemporaryFile Boolean true | false

5.2. Import Filter parameters

Import filter parameters that can be set via the setImportFilterParameter() method are described in Section 2, “Import Filters”. Parameter name, value type and possible values are given there at the end of each parameter description.

5.3. Export Filter parameters

Export filter parameters that can be set via the setExportFilterParameter() method are described in Section 3, “Export Filters”. Parameter name, value type and possible values are given there at the end of each parameter description.