Introduction
The effect of compiler and preprocessor options on an application's overall performance is generally not well understood. This paper examines the problem in the context of the new POWER2 and PowerPC 601 implementations in the RS/6000 family of machines, using the new XL compilers that exploit them. Experimentation shows that most applications can achieve a significant portion of their potential performance improvement using just basic optimization options. With additional architecture and application specific options, the compilers and preprocessors can provide performance improvements. This discussion illustrates the effect that tuning can have, and it also provides a framework for the user to judge the amount of tuning effort likely to be worthwhile.
Methodology
This paper uses the CINT92 and CFP92 benchmark suites from SPEC [1] as a set of sample applications for illustrating the improvements possible by the addition of various compiler and preprocessor options. The results are SPECratios that are calculated by dividing a benchmark's standard reference time by its actual execution time. The SPEC benchmark aggregates are the geometric mean of the SPECratios of the integer (SPECint92) and floating-point (SPECfp92) application sets. These aggregates should only be used as a general guide. If one of the benchmarks is similar to your own application, it should be used in preference to the aggregates.
All measurements were made with early versions of the compilers (IBM C Set ++ for AIX/6000 version 2.1 and XL Fortran version
3.1) on a RS/6000 Model 250 (PowerPC 601 processor), a RISC System/6000 Model 590 (POWER2 processor), and, for comparison purposes, a RS/6000 Model 580 (POWER processor). The KAP for IBM C 1.3 preprocessor was used with the C applications. The preprocessors used for Fortran applications were early versions of the new release of KAP for IBM Fortran from Kuck and Associates, Inc. and VAST-2 for XL Fortran from Pacific-Sierra Research. Although the experiments were done with early versions of the compilers, the general conclusions will hold when the production-level versions of the compilers become generally available. Figures 1 through 3 show the SPECratios of the benchmarks compiled for the POWER processor. Figures 4 through 6 show the SPECratios of the benchmarks compiled for the POWER2 processor. Figures 7 through 9 show the SPECratios of the benchmarks compiled for the PowerPC 601 processor. Figures 1 through 9 illustrate performance results in SPECratios to provide a normalized basis for performance comparison. These figures can be found at the end of the paper.
Using Basic Optimization
Adding the basic optimization flags substantially improves the performance of most applications. The primary optimization flag -O is equivalent to -O2. It performs only safe transformations on the code with very little context knowledge of the code. As a rough guide, applications compiled with -O will run between two and three times as fast as those with no optimization. Table 1 gives the increase in performance for the SPEC aggregates when the -O option is used. Figures 1, 4, and 7 graphically depict the data.

The amount of improvement will vary, depending on the implementation and the specific application. Overall, the compiler produces more dramatic increases on POWER and POWER2 systems than on PowerPC 601 systems when the -O option is added. On a benchmark-by-benchmark basis, adding -O yields the range of improvements shown in Table 2 for the six CINT92 and 14 CFP92 benchmarks.

Benchmark reports often use the best known set of options for the compiler and preprocessors. Controversy over which flags are acceptable when measuring benchmarks has developed as vendors become more aggressive in their search for higher SPEC numbers [2]. Table 3, as well as Figures 1 through 9 at the end of the paper, demonstrate that the -O option on its own can achieve a significant portion of this final performance.

In this release, the -O3 level of optimization has been significantly enhanced. This option allows the compiler to use more time, more memory intensive transformations, and more aggressive optimization assumptions. Often, an improvement in the performance of an application results. Suitable floating-point applications should benefit the most from this more aggressive optimization level. An application is suitable if it does not raise exceptions, care about the sign of zero, and is not sensitive to side effects in precision from reassociating floating-point expressions. Applications dominated by memory accesses may also see some improvement from the more aggressive optimization assumptions.
The -O3 optimization level does have the potential to change the semantics of a program when the -qstrict option is not specified. When -O3 is used on its own, the compiler may move code that might trigger an exception. Specifically, the compiler can move loads and floating-point calculations into areas where they are determined to be profitable, even though the operations may not have been executed in a stricter interpretation of the original program. The compiler also ignores the sign of (floating-point) zero when this option is enabled. The compiler may reassociate floating-point expressions, providing more opportunities to detect common subexpressions in the calculations, and may also enhance floating multiply-add instruction generation. This may affect the rounding of results; equivalent mathematical results are delivered, but the answers may not be identical because the calculations are performed with finite precision. If an application is sensitive in any of these areas, some of the benefit can still be obtained by using -O3 with the -qstrict option. The -qstrict option directs the compiler not to change the semantics of the program in the manner outlined previously. The -O option always has the -qstrict option implied.
When -O3 is specified, the default settings of some related options change as well; specifically, -qfloat=fltint:rsqrt is selected and -qmaxmem is set to use unlimited memory during compilation. The -qfloat=fltint option removes range checking on conversion of floating-point values to integer values. The -qfloat=rsqrt option means that when a square root operation appears as the divisor in a calculation, the optimizer can substitute a multiply that uses the result from a reciprocal square-root library call. This eliminates the need for an expensive divide operation. The -qmaxmem option allows the user to specify the amount of memory to be used during compilation. When -O3 is specified, this option's default settings allow the compiler to use as much memory as it needs during the optimization of an application. The use of this option generally increases compile time for complex applications. The -qmaxmem option defaults to a lower setting when the -O option is used.

Table 4 quantifies the improvements in the SPEC aggregates' performance realized by increasing the optimization from -O to -O3. The aggregate numbers are slightly misleading for the PowerPC 601 processor; certain individual benchmarks are improved by adding -O3 while others became worse (with the early compilers). A user should always verify that -O3 does improve the performance of an application before using it by default. By providing more information about the application and the target machine on which the application will run, further performance improvements may result.
Previous | Next
|