NOTE: For performance information regarding the XML Toolkit's XML and XSLT products, see XML Toolkit for z/OS Performance page
The XML4J parser for Java 6 SDK demonstrates significantly improved performance over the XML4J parser for Java 5, in most parsing scenarios. The results below were generated by measuring performance when parsing a set of XML documents of various sizes and complexities. The number of CPU cycles required to parse a single byte was determined by calculating the geometric mean of the per byte costs for all benchmark documents.
Parsing and transformation are CPU intensive activities. It is important to make efficient use of these technologies. The hints and tips below provide suggestions for building better performing parsing and transformation applications.
- General performance recommendations for XML Parser components
- XML4J performance recommendations
- XLXP-J performance recommendations
- XSLT4J performance recommendations
- XLTXE-J performance recommendations
General performance recommendations for XML Parser components
Use DTD and schema grammar caching to reduce the cost of validation
The processing required to prepare DTD and schema grammars for use in validation is expensive. It is possible to reduce the CPU time needed to validate an XML document by pre-parsing and caching DTD and schema grammars. These pre-parsed grammars may then be reused for subsequent validation. Performance enhancement due to reuse of pre-parsed grammars will vary depending on the size and complexity of the XML document and the schema or DTD. Sizable improvements may be expected when relatively small, simple documents are validated with cached grammars.
Turn validation off when not needed and avoid DOCTYPE clauses
Turn validation off if you don't need it. Also, avoid using a DOCTYPE clause if you don't plan to validate. DOCTYPE clauses are not required and should not be used unless there is a strong functional reason to do so. The parser will, by default, read the DTD if the DOCTYPE line is specified, even when not validating.
For best validating performance, use a DTD instead of a schema
Schemas provide extensive capability for sophisticated validation. Our tests have shown, however, that DTD's are more efficient than schemas when performing equivalent validating tasks. If your application does not require the added functionality provided by schemas, use a DTD to validate your XML documents.
Use external entities and external DTD's only when necessary
External entities and external DTD's result in extra file opens and the transcoding setup they require is expensive.
Avoid multiple parses of the same document
There may be times when it would be convenient to parse the same document more than once in the course of a single transaction. Duplicate parses are expensive. Avoid the overhead of additional parses by storing the results of the initial parse in a format which may be efficiently accessed at a later time.
Reuse the initialized parser
When parsing a small document, most of the CPU time is spent on parser initialization. You should, therefore, avoid instantiating a new parser every time you parse. Instead, create the parser once, and reuse the parser instance. A pool of reusable parser instances is a good idea if you have many threads parsing at the same time.
Avoid unnecessary document conversions
The XML parser will convert your document to Unicode before parsing. Avoid unnecessary overhead by preventing additional document conversion prior to the parse.
Reduce character count
Try to reduce your character count; smaller documents are parsed more quickly. Avoid unnecessary use of white space as spaces, tabs, and line feed characters must also be parsed.
Pre-normalize line end characters
Compliance with the XML specification requires the parser to normalize line end characters by replacing the two character sequence x'0D0A' and any occurrence of a solitary x'0D' with the single character x'0A'. Reduce parsing costs by constructing XML documents using x'0A' to indicate the end of a line of text.
Avoid using default attributes
Avoid the use of default attributes. Attributes are associated with elements within a document. A purchase order element might, for example, contain a status attribute set to "in process", "shipped", or "billed". If, in your DTD, you specify a default value for the attribute rather than explicitly assigning a value in your XML document, processing will be slower.
If possible, use SAX instead of DOM, especially when parsing large documents.
Parsing with SAX is more efficient and requires less memory.
Store your XML documents in the z/OS or OS/390 UNIX HFS
It is more efficient to store your XML documents in the z/OS or OS/390 UNIX HFS than in an MVS data set, such as SAM or PDS.
XML4J performance recommendations
For best performance, use the version of the Java XML parser packaged with JDK 6.
XML parsing performance continues to improve. To take advantage of the latest performance enhancements, install JDK 6 and use the embedded version of XML4J.
Read "Improve performance in your XML applications" Part 1, Part 2, and Part 3 from developerWorks to learn how to further increase the efficiency of your XML parser usage.
Turn off schema validation when it is not needed
Enabling schema validation unnecessarily increases parsing costs even when a schema is not used.
Turn off deferred DOM if you plan to traverse the entire DOM tree
Deferred DOM delays creation of the DOM tree until the tree is traversed. At that time, the tree is partially constructed to enable access to the required data. If you plan to traverse the entire document tree, it is most efficient to allow the parser to build the tree as the document is parsed.
Consider turning off Deferred DOM when parsing small documents
DOM parsing with Deferred DOM enabled may be less efficient for small documents. You may wish to disable Deferred DOM if most of the documents parsed are around 1K in size.
XLXP-J performance recommendations
Java 6 introduces new parsing technology with XLXP-J.
This proprietary code offers a high-performance alternative for customers wishing to take advantage of the pull API StAX (Streaming API for XML). This is not intended to be a plug-compatible replacement for XML4J.
In addition to the StAX API, XLXP-J offers a SAX nonvalidating API. Performance tests of standard benchmark documents have shown a geometric mean decrease in parsing cycles of roughly 20%, when using XLXP-J SAX vs. XML4J SAX.
XSLT4J performance recommendations
For best performance, use the version of the Java XSLT processor packaged with JDK 6.
XSLT processing performance continues to improve. To take advantage of the performance enhancements, use JDK 6 and use the embedded version of XSLT4J.
Create the Transformer once per thread
Transformers are not threadsafe; they are, however, reusable. To improve efficiency, consider creating the Transformer object once per thread. Then reuse that Transformer for all associated transformations.
Alternatively, create the Templates object once and then reuse it
Templates may be reused and may also be shared by multiple threads during concurrent processing. If your application will be using the same stylesheet to transform multiple XML documents, you may create the stylesheet Template once and then use that Template to produce a new Transformer for processing each XML document. For more information read about multi-threaded use of the Java XSLT processor.
Create one TransformerFactory per thread and reuse it
While transformer factories are not threadsafe, they are serially reusable. In order to improve efficiency, create one factory per thread and use that factory to generate the required transformers.
If frequent factory instantiation is unavoidable, improve performance by specifying factory names in either the Java system properties or the jaxp.properties file
Use of the Java API for XML Processing (JAXP) makes it possible for developers to change XML parser and XSLT processor implementations without having to rewrite application code. The names of the desired XML parser and XSLT transformer factories may be specified in several different locations which are searched in a prescribed order when an application instantiates a factory. Once a factory name is found the search terminates and an instance of the specified factory is constructed. The search order, as defined by the JAXP specification, is described below.
- Determine whether the factory name is specified as a Java system property.
- Look for jaxp.properties in the JAVA_HOME/lib directory.
- Check the META-INF/services directory of the appropriate jar for a file containing the factory name.
- Use the default factory name.
The META-INF/services directory of xalan.jar shipped with the Java XSLT processors in V1R6 of the XML Toolkit contains a javax.xml.transform.TransformerFactory file which specifies the default transformer factory implementation name. Factory names for both the javax.xml.parsers.SAXParserFactory and the javax.xml.parsers.DocumentBuilderFactory implementations are contained in the META-INF/services directory of xercesImpl.jar, also shipped with Java XSLT. During factory instantiation, the search for the factory names will proceed as described above until the request is satisfied in step 3. The search may be shortened by specifying the names as Java system properties or by using the jaxp.properties file.
Use of jaxp.properties was problematic in older versions of the Java XSLT processor as the file was physically opened and read each time a factory was instantiated. The V1R6 XSLT processors cache the contents of jaxp.properties in memory so physical file I/O is dramatically reduced. Therefore use of jaxp.properties in the newer versions of the code both shortens the search path and improves performance during instantiation of new transformer and parser factories.
You'll find a sample of jaxp.properties in the main directories of the V1R5 and V1R6 Java XSLT processors. To use the file, place a copy in the $JAVA_HOME/lib directory of your SDK.
Warning: jaxp.properties may adversely affect the performance of older parsers and XSL transformers on your system. Use this file with care.
XLTXE-J performance recommendations
Java 6 introduces new transformation technology with XLTXE-J.
This proprietary code offers a high-performance alternative to XSLT4J for customers, and represents the strategic direction for transformation technology.
Performance tests of standard benchmark documents have shown a geometric mean decrease in parsing cycles of roughly 20%, when using XLTXE-J vs. XSLT4J. Both test cases used a pre-compiled stylesheet to repeatedly transform the same document, as recommended for best performance.