XML Toolkit for z/OS Usage Information

Building and Running Samples

Obtaining the GNU Make Utility

API Information

API Documentation

Encoding Information

 
  

Building and Running Samples

 C++ Edition File Names and Directory Paths
Component Release File Name Directory Path
XML Parser for z/OS V1.10 ixmc570b.pax(or latest PTF level) /usr/lpp/ixm/IBM/xml4c-5_7
[full path to C++ parser]
V1.9 ixmc560b.pax(or latest PTF level) /usr/lpp/ixm/IBM/xml4c-5_6
[full path to C++ parser]
V1.8 ixmc550b.pax(or latest PTF level) /usr/lpp/ixm/IBM/xml4c-5_5
[full path to C++ parser]
XSLT Processor for z/OS V1.10 ixmcx21b.pax(or latest PTF level) /usr/lpp/ixm/IBM/xslt4c-1_11
[full path to C++ XSLT processor]
V1.9 ixmcx20b.pax(or latest PTF level) /usr/lpp/ixm/IBM/xslt4c-1_10
[full path to C++ XSLT processor]
V1.8 ixmcx19b.pax(or latest PTF level) /usr/lpp/ixm/IBM/xslt4c-1_9
[full path to C++ XSLT processor]

a right pointing arrow  V1.10 Samples
 a right pointing arrow V1.9 Samples
 a right pointing arrow  V1.8 Samples
 
  

Obtaining the GNU Make Utility

gmake is used to build the samples. It is not required if you determine the parameters and environment variables and invoke c89 by command. The Toolkit provides the lib and include directories which can readily be included using any UNIX build mechanism just as with any other API. However, all our instructions and build files we ship assume that the developer will use gmake.

For instructions on downloading and installing gmake, see the z/OS UNIX Tools and Toys web site. For information on using gmake, see the redbook Open Source Software for OS/390 UNIX, SG24-5944 available online at the IBM redbooks web site.

 
  

API Information

There are two categories of API :
  • Public - for general use
  • Experimental - subject to change

PUBLIC

For exported classes, any function or method that includes Doc++ style API documentation that does not contain an Experimental or Internal marking is part of our supported (Public) API.

EXPERIMENTAL

"Experimental" code is likely to be new code, may be buggy, or may be based on standards which are likely to change. Use at your own risk.

 Documentation

For V1.10, V1.9, and V1.8 (except XSLT C++), API documentation, refer to the following sources :

  • V1.8 XML Parser C++ Edition : [full path to the C++ Parser]/doc/index.html
  • V1.9 XML Parser C++ Edition : [full path to the C++ Parser]/doc/index.html
  • V1.10 XML Parser C++ Edition : [full path to the C++ Parser]/doc/index.html
  • V1.8 XSLT Processor C++ Edition : [full path to the C++ Processor]/docs/index.html
  • V1.9 XSLT Processor C++ Edition : [full path to the C++ Processor]/docs/index.html
  • V1.10 XSLT Processor C++ Edition : [full path to the C++ Processor]/docs/index.html

Note that we recommend /usr/lpp/ixm/IBM for each of the subdirectories above. The files can be viewed with a Web browser if a Web server is running on the system that the documentation is on.

 
  

Encoding Information

Introduction

The promise of XML is that it is portable and works on all platforms. Making this work effectively and efficiently requires program design that takes into account the specific situation pertinent to a particular application (for example, where the document originates, where it is likely to be processed, the performance requirements, the throughput requirements, where the document is stored and how it is likely to be accessed). Proper encoding of XML documents will require thought and consideration at application design time.

The following information is intended to give application programmers guidance on how to deal with encoding of XML documents.

Encoding and XML

This section presents the encoding rules in a simple and straightforward manner as background to the discussion of encoding of XML on z/OS. It is not intended to reproduce the detail of the XML 1.0 specification or to cover every possible case.

The XML standard defines encoding fairly rigorously. If the document is not in UTF-8 or UTF-16, the encoding of the document must be specified via the encoding= attributes on the processing instruction. Also, even though it is possible for the encoding specified via the transport protocol to override the encoding declaration, it is strongly advised that the actual encoding of the document match the encoding specified on the encoding= attribute. Problems occur if the document is converted from one code page to another without the encoding= attribute being changed. There are places where conversion takes place without the knowledge of the application programmer. Examples of these include file transfer using ftp (File Transfer Program) without the binary option and storing files in a database using DRDA.

Whenever possible, avoid having these types of conversions so that mismatches do not occur. The XML parser converts the document to Unicode for processing and is capable of handling many different code pages. Also, converting from one code page to another can cause loss of data if there are code points in the original code page that are not present in the target code page. Avoiding conversion prior to calling the parser results in the most efficient (from a performance perspective) and least error-prone solution. Conversion is expensive and if the document is converted before the parser is invoked, two conversions actually occur - once from the original code page and once to Unicode within the parser. Therefore, use the binary option on ftp and equivalent file transfer mechanisms.

XML is intended to be a portable data format. The truly portable encoding is Unicode. Therefore whenever possible, it is best to use Unicode as the encoding for XML documents. However, not all platforms provide easy to use facilities for handling Unicode. As a compromise, ASCII is another portable encoding that is better supported via facilities. It is recommended that XML documents intended for use on other platforms be encoded in US ASCII or UTF-8 or UTF-16. This also provides performance benefits because the XML parser is optimized for these encodings.

XML and z/OS

 

XML 1.0 specification :

The XML 1.0 specification defines CR (Carriage Return), LF (Line Feed), and the combination CR-LF (Carriage Return followed by Line Feed) as acceptable white space characters. These characters are to be converted to LF by the XML processor (what the specification defines the parser to be). Unfortunately, the XML 1.0 specification does not define NEL (New Line or Next Line) as acceptable.

This presents a problem on z/OS because the most common end-of-line character is NL (x'15'). This is commonly associated with the unicode NEL character (x'85'). The C '\n' string converts to NL (NEL), editors and file I/O routines in the C runtime insert NL to indicate end-of-line in byte oriented file systems like the HFS (Hierarchical File System). Note that this is not an issue in the native MVS environment where file systems are record oriented. Therefore, if the XML document is created using C or C++ and the application programmer does not do any special programming to avoid it, the line ending character will be NL. This is not recommended for XML documents because by nature, they are intended to be portable. The NL is common on z/OS, but not on other platforms and therefore is not portable.

Unfortunately, this means that the application programmer has to be cognizant of this fact and program around it. There are two options available to programmers writing code to create XML 1.0 documents.

  1. The simplest way to create portable XML 1.0 documents is to use iconv() to convert them to ASCII or Unicode before sending them out of the application program. Iconv() will convert the NEL to LF in ASCII and the problem is therefore avoided.
  2. Another option is to define a literal for LF and use it instead of the string '/n' to create line breaks. This approach works if the file will not be edited or otherwise manipulated on z/OS (remember, most mainframe editors insert NL characters!). Also, if this file is edited on z/OS, the document will appear to be a single line (since there aren't any NL characters in it) and therefore not very readable.

If you need to edit or view the file, it is best to convert it to ASCII and then use viascii (available at z/OS Unix Tools) to edit it.

For the other case, where the program is processing a received XML document, the situation is more complex. The fastest (and in some cases, the simplest) solution is to not convert the file into EBCDIC. If the file is in ASCII or Unicode, then it will have LF as the end-of-line indicator and there won't be any problem with the line ending. However, this is much more complex to program. Depending on the specific situation (for example, development/test vs production), conversion may or may not be required. However, the recommendation to avoid conversion if at all possible, still holds, especially in a production environment where the cost of conversion can be prohibitive. For development/test situations, where the file may have to be viewed or edited for debugging purposes, conversion may be the right answer. The parser converts all the data into Unicode so converting the data to EBCDIC after parsing is required. At this point, only data that is required needs to be converted. Note that converting small strings may be less efficient than converting larger strings. Also, handling Unicode or ASCII data in a z/OS program does require care in programming and isn't always simple. All these factors need to be considered in a set of trade-offs when designing the application.

If the file is in EBCDIC and it is known to originate on a z/OS system, then the line ending is marked with a NL. The XML Parser, C++ Edition will accept XML documents that have a NL as a line termination character. Even though these are non-compliant XML documents, the parser will normalize the line-endings to LF. However, these documents are non-compliant and will not be accepted by parsers on other platforms. In general, EBCDIC is not a portable encoding so IBM does not recommend using EBCDIC for XML documents going between platforms or on the Internet.

Avoiding Conversion

Most transport protocols have mechanisms to avoid conversions. Here are some of the more common products used for transport and the options to turn off conversion (if they exist). Detailed descriptions of these options and their uses are in the documentation associated with each product.

File Transfer Program : The binary option prevents FTP from converting the file.

MQSeries : Do not specify MQGMO_CONVERT option on the MQGET call.

DRDA : It is not possible to turn off conversion except by using 'FOR BIT DATA' but this can have other side effects. The DB2 XML Extender has filters that convert LF to NL and vice versa to ensure that the document is correct.

XML 1.1 Specification :

The V1.6 XML Parsers and above support the XML 1.1 Formal Recommendation Specification. This adds the Ebcidic NL (NEL) as an acceptable whitespace characer. With this support, an NL and a CR/NL will be normalized to an LF.

Contact IBM

Browse z/OS