Skip to main content

Business Intelligence

Indexing and statistics strategies for DB2 UDB for iSeries

Part 3: Indexing Considerations

Once you understand what can be done with indexes which are useful for the application, you also need to consider the more pragmatic aspects of indexes, such as: creation techniques, maintenance strategies, and capacity planning issues.

SQL Indexes versus Keyed Logical Files
As mentioned in the beginning of this paper, there are two interfaces available for creating database objects. Both the SQL CREATE INDEX statement and the OS/400 command CRTLF can be used to create a radix index. Both interfaces create the same index object, but with different attributes. These attributes can have an effect on query optimization and performance. Regardless of how the index was created, the optimizer will recognize and consider the indexes during optimization.

SQL indexes are created with a 64K logical page size. Keyed logical files are primarily created with a logical page size of 8K. This attribute cannot be specified during object creation or subsequently changed. The logical page size is determined by the interface used to create the index. Regardless of the logical page size, the overall object sizes of a SQL index and a keyed logical file tend to be equivalent. The larger logical page size can result in more efficient index scans and index maintenance. These are the key benefits in a query environment. Indexes with larger logical page sizes can have an impact on the I/O performance within environments that have smaller, less than optimal memory pools.

SQL indexes that are journaled (index logging) will cause the journal receivers to fill up more rapidly. To avoid filling up and regenerating new journal receivers more often, consider increasing the journal receiver size to at least 6.5GB, and set the receiver size attributes using CRTJRN or CHGJRN... RCVSIZOPT(*RMVINTENT *MAXOPT2)

Indexes will be created with the 64K logical page size when:
  · Creating SQL indexes on SQL created tables
  · Creating SQL indexes on CRTPF / DDS created physical files
  · Creating SQL constraints on SQL created tables
  · Creating constraints with ADDPFCST on SQL created tables
  · Creating temporary indexes during query execution

In some cases, the optimizer will choose to create a temporary index with a logical page size of 64K instead of using a permanent keyed logical file with a logical page size of 8K. This occurs because the I/O cost of scanning the 8K index is higher than the cost of using the 64K index, even considering the time to create the temporary index. To overcome this behavior, the developer should consider creating a SQL index.

Based on the application development philosophy and database serving environment, use the interface that suits you. For example, if a developer is using SQL for object creation and database access, then use SQL to create indexes.

Estimating the Number of Indexes to Create
It would be nice to provide a rule of thumb for an appropriate number of indexes to build for different kinds of schemas and databases. But like many things in the application development world, there is not a simple formula for the appropriate number of indexes. It depends on the size of the tables and the relative size of the updates. It depends on the amount of system resources available for the load and update process. It depends on the maintenance window available.

Keep in mind that business intelligence environments are typically read-only, so index maintenance is only an issue during the load and update processes. In addition, business intelligence applications are more likely to allow ad-hoc queries which, especially when radix indexes are the only option, may require more than 10 or 15 indexes to satisfy all the possible query combinations.

On-line transaction processing (OLTP) environments must support adds, changes, and deletes to the data — usually at high velocities. Index maintenance must be considered when tuning queries with an extensive indexing strategy.

Index Creation
With DB2 UDB for iSeries, indexes can be created with SQL or iSeries Navigator. It is recommended that some sort of documentation process be used to keep track of the index creation source. Some ideas for maintaining the source are:

· Place the SQL in an OS/400 source file and use the Run SQL Statements (RUNSQLSTM) command to execute the SQL.
· Place the SQL in a PC client text file and use iSeries Navigator — run SQL Scripts to execute.

Creating indexes can be a very time-consuming process, especially if the underlying tables are large. On iSeries servers with multiple processors, consider using SMP to create the indexes in parallel. iSeries servers enjoys linear scalability when creating large indexes in parallel. In other words, with two processors, the index will be created in half the time. With four processors, the index will be created in one fourth the time, and on a 24-processor server, the index will be created approximately 24 times faster than on a single processor system. Both binary radix and EVI indexes are eligible to be created this way. The optional SMP feature is required to be installed and enabled to create indexes in parallel.

fig 7

DB2 UDB for iSeries is the only database system that can create indexes in parallel with:

· Database "on-line"
· Non-partitioned data sets
· Bottoms-up process (to keep the tree balanced and flat)
· High degree of key compression
· Linear scalability

Another technique when creating multiple indexes on a server with multiple processors is to create indexes simultaneously, one per processor. In other words, submit index creations, one to a processor without using SMP. In this way, each index will consume one processor and multiple indexes can be created at the same time. This works particularly well when the indexes are relatively small.

Index Maintenance
Index maintenance may occur anytime data is added, changed, or deleted. If a new row is added to a table, any indexes over that table will have to be updated to reflect the new row. This is also true if the row is deleted, or a value of a key column is changed.

DB2 UDB for iSeries supports three maintenance options for binary radix indexes: immediate, delay, and rebuild. Encoded vector indexes are always maintained immediately. The immediate maintenance option is the default when creating an index, and is typically the only option that should be used in a query environment. The optimizer cannot use an index that has a maintenance option of rebuild. In other words, the index with a maintenance option of rebuild is not of any value to the query and will not help with statistics or implementation. The optimizer does consider indexes with a maintenance option of delay, but must do extra optimization for these indexes. That is, it looks at how many changes are waiting to be performed on the index by looking at the delayed maintenance log and predicts how these changes will affect the index. Besides making the optimizer guess at how the pending changes will affect the index, using an index with a maintenance option of delay causes additional query execution time — simply because the pending changes must be performed on the index when it is opened. This index maintenance will increase the query response time.

The simplest and most straightforward recommendation is to balance the number of indexes required for acceptable query performance with the total number of indexes that must be maintained during I/O operations.

Another recommendation is to take advantage of parallel index maintenance using the optional SMP feature of OS/400. Parallel index maintenance has been available since Version 4 Release 3, and supports greater I/O velocities by using multiple database tasks to maintain indexes in parallel during blocked insert operations. For example, if there are eight indexes over a given table and applications are inserting data into the table, the database tasks can maintain each index in parallel. Otherwise, the application would wait for each of the eight indexes to be maintained serially. The overhead for this parallel maintenance is the use of more CPU resources within a given unit of time.

Depending on the number of rows being inserted, the system configuration, and the number of indexes that are over the database table; the recommendation may be to drop all indexes, perform the updates, and then rebuild the indexes upon completion of the process. This is due to the fact that maintenance of indexes serially will slow down the bulk insertion or update process. With SMP and parallel index maintenance, it may not be necessary to drop the indexes before beginning the update process.

In general, if the percentage of rows being added in an update process is more than 20% of the total size of a table, it is probably better to drop the indexes and rebuild them after the update completes. The threshold for when to use parallel index maintenance versus dropping indexes will vary widely based on the size, number, and complexity of indexes in your database. Even if the update delta is not relatively large, both methods should be tested to determine which one is better.

EVI Maintenance
Although EVIs may drastically simplify the indexing strategy, it is important to understand how the system maintains EVIs so that maintenance costs can be minimized while maximizing the benefits.

In general, you need to be aware of the issues that affect EVI maintenance:

  1. The maximum number of distinct values
    When an EVI is defined, you can optionally include a WITH n DISTINCT VALUES clause. If this clause is not included, the database engine will determine the bytecode size (currently one byte, two bytes, four bytes) based on the actual number of distinct key values at index creation time. This may not reflect how many distinct key values will actually be represented in the data, once the table is fully populated. Since the EVI symbol table "compresses" the key values into one, two, or four bytes; the maximum number of distinct values for each bytecode size is as follows:
  2. If FOR n DISTINCT VALUES is between:Then the width of the vector table is:
    1 and 2551 byte
    256 and 65,5352 bytes
    65,536 and 4.2 billion4 bytes

    If an EVI is defined for 255 distinct values and then 256 distinct values are inserted, DB2 UDB for iSeries will automatically rebuild the index with a 2-byte vector table. However, a performance degradation will be experienced while the index is rebuilt, since the EVI will not be available to the optimizer or database engine. Therefore, it is important to have a good idea of how many distinct key values will be represented before creating the EVI.

    The consequence of defining more distinct values than are needed is simply that the index will take up additional disk space. For example, if an index with 300 distinct values is defined and only 200 distinct values are ever inserted, the vector will have one extra byte for every row in the database table. Except for organizations with tightly constrained disk availability, the extra byte should be insignificant.

  3. Insertion order
    When new values are inserted into a table with indexes present, DB2 UDB for iSeries will automatically maintain the currency of the indexes. When a new row is inserted into the table, DB2 UDB for iSeries scans the EVI symbol, finds the matching value, and updates the statistics in the symbol table. If a new distinct value is introduced to an existing EVI, one of two things happens:
    · If the new value is logically ordered after the last distinct value in the symbol table, the value is added to the end of the table. For example, if there is an index over the column MONTHNUM in the table TIME_DIM and TIME_DIM currently contains values for the first quarter only, when values are inserted such as MONTHNUM = 4, the value 4 and its associated statistics will be added to the end of the EVI symbol table.
    · If the distinct value is out of sequence from the indexes order, then the value is placed in an overflow area of the symbol table, where it remains until the index is rebuilt or refreshed. For example, if the table TIME_DIM contains values for months 1, 2, and 4 and MONTHNUM = 3 is inserted into rows, the distinct value 3 and its associated statistics are placed in an overflow area of the symbol table until the index is dropped and rebuilt, or the EVI is refreshed.

Prior to V5R2, the efficiency of the EVI can decrease as more and more distinct key values are placed out of order. Because of this, there is a limited number of distinct key values that will be placed in the overflow area of the symbol table. If the threshold is reached during insertion of a new key value, the EVI will automatically be refreshed.

For V5R1 and earlier OS/400 releases, the threshold limits are:
        100 values for 1 byte code
        1,000 values for 2 byte code
        10,000 values for 4 byte code

In V5R2, the access to the EVI's overflow area was enhanced with some additional indexing technology. This improvement significantly increases the efficiency of the maintenance process and the use of the EVI when distinct key values are present in the overflow area of the symbol table. In V5R2, the threshold limit is increased to 500,000 values, regardless of bytecode size. This should allow greater latitude in regard to where EVIs can be used and maintained.

During the refresh process, the EVI is placed in delayed maintenance mode and is not available for use by the optimizer or database engine. A significant performance degradation may be experienced during this process. For this reason, EVIs are usually not advised for OLTP environments. Also, it is recommended that whenever several new distinct values are being inserted, you should consider dropping the EVI and recreating the index(es) after the inserts. For example, when loading data that represents many new distinct keys into data warehouse tables, it is a good idea to drop the EVIs prior to the loading process, then recreate the EVIs after the loading process.

Remember, too, that the values in the overflow area can be checked by issuing a Display Field Description (DSPFD) command or using iSeries Navigator, and looking at the overflow area parameter.

There are two ways to "refresh" the EVI and incorporate the overflow values back into the symbol table:

  1. Drop all the indexes and recreate them using standard SQL statements or iSeries Navigator.
  2. Use the Change Logical File (CHGLF) command with the attribute Rebuild Access Plan set to *Yes (FRCRBDAP(*YES)). This command will accomplish the same thing as dropping and recreating the index, but it does not require any knowledge about how the index was built. This command is especially effective for applications where the original index definitions are not available.

EVI Maintenance Overview
When using EVIs, there are unique challenges to index maintenance. The following table shows a progression of how EVIs are maintained and the conditions under which EVIs are most and least effective, based on the EVI maintenance idiosyncrasies.


Most effective























Least effective
ConditionCharacteristics
When inserting an existing distinct key value- Minimum overhead
- Symbol table key value looked up and statistics updated
- Vector element added for new row, with existing bytecode
When inserting a new distinct key value — in order, within bytecode range- Minimum overhead
- Symbol table key value added, bytecode assigned, statistics assigned
- Vector element added for new row, with new bytecode
When inserting a new distinct key value — out of order, within bytecode range- Minimum overhead if contained within overflow area threshold
- Symbol table key value added to overflow area, bytecode assigned, statistics assigned
- Vector element added for new row, with new bytecode
- Considerable overhead if overflow area threshold reached
- Access path invalidated — not available
- EVI refreshed, overflow area keys incorporated, new bytecodes assigned (symbol table and vector elements updated)
When inserting a new distinct key value — out of bytecode range- Considerable overhead
- Access path invalidated — not available
- EVI refreshed, next bytecode size used, new bytecodes assigned (symbol table and vector elements updated).

General Index Maintenance Recommendations
Remember, whenever indexes are created and used, there is a potential for a decrease in I/O velocity due to maintenance. Therefore, it is essential that the maintenance cost of creating and using additional indexes is considered. For radix indexes with MAINT(*IMMED) and EVIs, maintenance occurs when inserting, updating, or deleting rows.

To reduce the maintenance of the indexes, consider:

· Minimizing the number of indexes over a given table
· Dropping indexes during batch inserts, updates, and deletes
· Creating indexes, one at a time, in parallel using SMP
· Creating multiple indexes simultaneously with multiple batch jobs using multiple CPUs
· Maintaining indexes in parallel using SMP

The goal of creating indexes for performance is to balance the maximum number of indexes for statistics and implementation while minimizing the number of indexes to maintain.

Recommendations for EVI Use
Encoded vector indexes are a powerful tool for providing fast data access in decision support and query reporting environments. However, to ensure the effective use of EVIs, they should be implemented with the following guidelines:

Create EVIs on:

· Read-only tables or tables with a minimum of INSERT, UPDATE, and DELETE activity
· Key columns that are used in the WHERE clause — local selection predicates of SQL requests, and fact table join columns when using star schema join support
· Single-key columns that have a relatively small set of distinct values
· Multiple-key columns that result in a relatively small set of distinct values
· Key columns that have a static or relatively static set of distinct values
· Non-unique key columns, with many duplicates

Create EVIs with the maximum bytecode size expected:

· Use the WITH n DISTINCT VALUES clause on the CREATE ENCODED VECTOR INDEX statement
· If unsure, consider using a number greater than 65,535 to create a 4-byte code, thus avoiding the EVI maintenance overhead of switching bytecode sizes as additional new distinct key values are inserted

When loading data:

· Drop EVIs, load data, create EVIs
· EVI bytecode size will be assigned automatically based on the number of actual distinct key values found in the table
· Symbol table will contain all key values, in order, no keys in overflow area

[Back | Next]