Skip to main content

IBM DB2 for i: Articles

Database Statistics


Introduction

The DB2 for i Query Optimizer (known as The Optimizer) is dependent on many factors when attempting to make the "best" decision when implementing a query. These factors include the number of processors, processor speed, number of disk arms, amount of memory, PTF level, availability of SMP, etc.
The list goes on and on.

One of the most vital pieces of information available to the Optimizer is database statistics, the information contained in the database file descriptions associated with the tables, views and indexes which exist on the system. These statistics include record counts, record size, index entries, unique key values, etc. Unlike other platforms that require manual statistic generation, this information is dynamic on the IBM i and has always been instantly available to the IBM i Optimizer (and it's father the AS/400 and grandmother System/38).

The importance of this data cannot be stressed enough. An abundance of statistics, or lack thereof, can make a magnitude of difference when implementing and executing a query. The gist of this article is to highlight this point through the experiences of actual customers. As usual, the names of the customers have been changed to protect the authors from frivolous and annoying lawsuits.

The Case of Yes Meaning Maybe

The Optimizer has the capability of implementing queries by making copies of the data . These copies are materialized as temporary files and/or bitmaps. The ability to make a copy of the data is determined by the setting of the ALWCPYDTA parameter. This parameter can be found on the following commands: all CRTSQLxxx compiler commands, OPNQRYF, STRSQL and RUNSQLSTM.

When the Optimizer is restricted from making copies of the data then performance can become adversely affected as seen in the Case of Yes Meaning Maybe.

Background

The Sam and Dave Financial Group (SDFG) has developed an n-tier application with the IBM i as the database server. SDFG is utilizing stored procedures written in RPG returning result sets to the application server. The initial implementation response times were unacceptable. Initial analysis of Database Monitor (DbMon) data, by SDFG, revealed a a large number of index builds occurring (DbMon record type 3002) and a substantial number of index advisories (DbMon record types 3000, 3001 and 3002).

The Data Base Analyst (DBA) at SDFG built permanent indexes for the longest running index builds Experiencing a noticeable improvement in overall response time, the DBA subsequently built all indexes advised by the Optimizer. Overall response times improved again, however the CPU utilization increased substantially reducing throughput. In addition, DbMon data reported an increase in temporary indexes being built, however these index builds were very short in duration.

Gathering the evidence

Since the customer was collecting DbMon data on a regular basis I chose to begin my investigation by analyzing the data. I used the following SQL statement to extract a list of index builds in descending sequence by elapsed time of build (a base set of queries used to analyze DbMon data can be found at:
www.ibm.com/systems/i/software/db2/dbmonqrys.htm

SELECT J1.QQETIM-J1.QQSTIM as Time, J2.QQC103 as Program, J2.QQC104 as Lib, J1.QQPTFN as Table, J1.QQTOTR as TableRows, J1.QQRIDX as IdxEnt, J1.QQRCOD as Why, SUBSTR(J1.QQ1000, 1, 100) as KeyColumns
FROM QTEMP.DBMONDATA J1 INNER JOIN QTEMP.DBMONDATA J2 ON (J1.QQJFLD = J2.QQJFLD)
WHERE J1.QQRID = 3002 AND J2.QQRID = 1000 AND J2.QQC21 = 'OP'
ORDER BY 1 desc

The following table contains an abbreviated result of the query:

Time Program Lib Table Rows IdxEnt Why Index columns
0.62 SQL_SERVER SQL_PROC ITEM_FACT 600,572 19,305 I2 QUANTITY DESCEND

The results indicate that a temporary index was built over table ITEM_FACT as result of calling Stored Procedure SQL_SERVER in library SQL_PROC. This took approximately .6 seconds This table contained 600572 rows at the time of the query. The index itself only contained 19305 entries (this is known as a sparse index). The index was built to satisfy the ordering specified in the query (reason code I2). The index was built on the QUANTITY column in descending sequence.

With slight modifications to the above SQL statement I could quickly summarize by table or program. The results of these summations indicated that this program was executed frequently and resulted in a temporary index build on each execution.

Understanding the query embedded in a program

I could choose to continue pulling information from DbMon, but because I am dealing with embedded SQL I prefer to go right to the source. My next step was to run the Print SQL Information (PRTSQLINF) command against the SQL_SERVER program in library SQL_PROC. The following is a excerpt from the spool file created:

CRTSQLRPGI
OBJ(SQL_PROC/SQL_SERVER)
SRCFILE(SQL_SOURCE/QRPGLESRC)
SRCMBR(SQL_ORDER3)
COMMIT(*NONE)
OPTION(*SYS *PERIOD)
TGTRLS(*PRV)
ALWCPYDTA(*YES)
CLOSQLCSR(*ENDACTGRP)
RDB(*LOCAL)
DATFMT(*ISO)
TIMFMT(*ISO)
DFTRDBCOL(*NONE)
DYNDFTCOL(*NO)
SQLPKG(SQL_PROC/SQL_ORDER3)
ALWBLK(*ALLREAD)
DLYPRP(*YES)
DYNUSRPRF(*USER)
SRTSEQ(*HEX)
LANGID(ENU)
RDBCNNMTH(*DUW)
TEXT('Index ordering
')
STATEMENT TEXT CCSID(65535)
SQLPATH(*LIBL)
DECLARE SQL_ORDER3 CURSOR FOR SELECT * FROM TEAM02 / ITEMFACTV1 WHERE YEAR = SMALLINT ( 1998 ) AND MONTH = SMALLINT ( 6 ) ORDER BY QUANTITY DESC SQL4021 Access plan last saved on 06/09/01 at 16:23:31.
SQL4020 Estimated query run time is 18 seconds.
SQL4027 Access plan was saved with DB2 UDB Symmetric Multiprocessing installed on the system.
SQL4006 All indexes considered for table 1.
SQL4012 Index created from index SQLRDIX001 for table 1.
SQL4011 Index scan-key row positioning used on table 1.
OPEN SQL_ORDER3

ETCH NEXT FROM SQL_ORDER3 FOR : H ROWS INTO : H
SET RESULT SETS ARRAY : H FOR : H ROWS
CLOSE SQL_ORDER3

The first part of the report shows the compile options used to create the ILE module SQL_ORDER3. The first parameter to catch my eye is the Allow Copy Data (ALWCPYDTA) setting. Here we see that it is set to *YES.

The second part of the report contains the SQL statements embedded in the program and the access plan generated by the Optimizer. Notice that the DECLARE CURSOR statement is the only statement requiring an access plan. Also note that SMP has been installed on the system.

The program is using a concept of result sets and blocked FETCH. I will discuss these in more detail in a follow on article.

Understanding the impact of the ALWCPYDTA setting

The default for the ALWCPYDTA parameter on the CRTSQLRPGI is *OPTIMIZE. Someone went out of their way to change the parameter to *YES when creating this module. When confronted, the culprit confessed and stated that it was OK for the Optimizer to make a copy of the data as it was primarily used for analysis and not real time access. Thus, they changed the parameter to ensure this would happen, assuming *YES means yes.

The ALWCPYDTA parameter allows three values: *YES, *NO and *OPTIMIZE. The setting *NO is pretty straightforward, *NO means no. The optimizer will not make a copy of the data under any circumstances. It will die first before it makes a copy.

The *OPTIMIZE setting means yes, full permission is granted to make copies of the data. The Optimizer can choose to make a copy of the data if it believes that this is the most efficient way to implement the query.

The *YES setting actually means maybe. The Optimizer has the power to create a copy of the data if and only if this is the only way the query could possibly execute. This setting virtually biases the Optimizer to use or create indexes to implement the query, copies of the data are considered only as a last resort.

To implement the ordering method required by the query in the SQL_SERVER program the Optimizer is forced to make a decision. It can use index ordering or it can sort a temporary result. Since an index is not a copy of the data, and the Optimizer has been handcuffed by the *YES setting it chooses to create the temporary index.

Understanding temporary index builds

The Optimizer will only build a temporary index to satisfy the Join, Grouping and/or Ordering specified within the query. An index will never be built for data selection alone. To build an index DB2 must process all the records in the table. It would be less expensive to simply select the records as the table is being scanned.

DB2 has the ability to create an index from an existing index. This is much less expensive then processing the entire physical table. However, in this case, temporary index builds are not desirable for transactions that are executed frequently.

Keep in mind that index builds are parallel enabled. If SMP is installed on the system and turned on, the Optimizer may lean towards index builds to take advantage of multiple tasks. However, this can result in excessive CPU resource and reduced throughput as was the case here.

Understanding the index advisor

During the optimization process the optimizer establishes a default cost for implementing the query and then attempts to find a cheaper method by analyzing available indexes. The Optimizer chooses indexes which have columns that pertain to the local selection, joining, grouping or ordering specified within the query. Local selection refers to the selection predicates contained in the WHERE clause which are not used for join purposes.

If the Optimizer cannot find an index which satisfies the selection criteria then it may choose to suggest an index for performance reasons. The columns suggested for the index are based on selection only and do not satisfy the joining, grouping or ordering requirements. For example, the SQL SELECT statement within the DECLARE CURSOR statement above may result in a suggested index for columns YEAR and MONTH only. The QUANTITY column would not be included in the list of suggested key fields.

Creating the advised index helps the Optimizer in two ways: 1) the index provides statistics which are valuable for cost estimating and 2) the index may be used to implement the query. Today there is no indication that an index was used to provide statistics alone. This is important to consider before you delete an index based on last use date.

To implement the query in the SQL_SERVER program, the Optimizer used an existing index to select the records which satisfy the local selection (YEAR = 1998 and MONTH = 6) and then used those selected records to build a temporary index which satisfies the ordering (QUANTITY DESC).

Solving the mystery

The DBA at SDFG made a great decision in deciding to build permanent indexes to eliminate temporary index builds. However, the DBA went one step too far when deciding to build permanent indexes for all of the advised indexes. This, coupled with the ALWCPYDTA(*YES) parameter gave the Optimizer little choice in how to implement the query (index from index).

There were two options available for fixing this problem. The first would be to create the "perfect" index. We define this as a radix index which contains the columns necessary to provide statistics (local selection) and satisfy the joining, grouping and/or ordering requested. In addition, this index may contain those columns which are part of the result set in order to achieve index only access.

The second option is to give the Optimizer permission to sort the result set. This is accomplished by specifying *OPTIMIZE on the ALWCPYDTA parameter. SDFG chose this option rather than building more indexes. The following is the output from the PRTSQLINF command after recompiling the SQL_ORDER3 module and updating and running the SQL_SERVER program:

CRTSQLRPGI
OBJ(SQL_PROC/SQL_SERVER)
SRCFILE(SQL_SOURCE/QRPGLESRC)
SRCMBR(SQL_ORDER3)
COMMIT(*NONE)
OPTION(*SYS *PERIOD)
TGTRLS(*PRV)
ALWCPYDTA(*OPTIMIZE)
CLOSQLCSR(*ENDACTGRP)
RDB(*LOCAL)
DATFMT(*ISO)
TIMFMT(*ISO)
DFTRDBCOL(*NONE)
DYNDFTCOL(*NO)
SQLPKG(SQL_PROC/SQL_ORDER3)
ALWBLK(*ALLREAD)
DLYPRP(*YES)
DYNUSRPRF(*USER)
SRTSEQ(*HEX)
LANGID(ENU)
RDBCNNMTH(*DUW)
TEXT('Index ordering compiled with ALWCPYDTA *OPTIMIZE ')
STATEMENT TEXT CCSID(65535)
SQLPATH(*LIBL)
DECLARE SQL_ORDER3 CURSOR FOR SELECT * FROM TEAM02 / ITEMFACTV1 WHERE YEAR = SMALLINT ( 1998 ) AND MONTH = SMALLINT ( 6 ) ORDER BY QUANTITY DESC
SQL4021 Access plan last saved on 06/09/01 at 18:29:26.
SQL4020 Estimated query run time is 1 seconds.
SQL4027 Access plan was saved with DB2 UDB Symmetric Multiprocessing installed on the system.

SQL4002 Reusable ODP sort used.
SQL4006 All indexes considered for table 1.
SQL4010 Table scan access for table 1.
SQL4011 Index scan-key row positioning used on table 1.

Simply recompiling the program with the new parameter resulted in the estimated run time drop from 18 seconds to 1 second. Collected Performance Explorer (PEX) data showed the following results:

Program/Compile Option Cumulative CPU (ms) Percent Change
SQL_ORDER3 ALWCPYDTA(*YES) 3,067.22  
SQL_ORDER3 ALWCPYDTA(*OPTIMIZE) 142.98 95%

The CPU consumption went from approximately 3 seconds to about 143 milliseconds or a 95% reduction in CPU burn. SDFG saw an overall 50% reduction in system CPU utilization after implementing the change. Although the elapsed time of the index build was only .6 seconds DB2 use on average 5 processors to accomplish the task

Conclusion

This case demonstrated two important factors . First, the Optimizer is very complex and is influenced by a number of factors (indexes and compiler options are but two). Second, the Optimizer has several choices available for implementing a single SQL statement.

Without some understanding of the Optimizer and the data access methods available it is nearly impossible to identify and resolve the optimization and performance related issues that occur when implementing query based applications.

We're here to help

Easy ways to get the answers you need.


or call us at
1-866-883-89011-866-883-8901
Priority code:
101AR13W


DB2 Web Query

Web based query and reporting


IBM i for BI Solution

IBM i for Business Intelligence is a packaged solution that is easy to order and easy to implement, and easy to maintain. Everything you need for an out of box analytical solution based on IBM i and DB2 Web Query.