Got those Query Blues?
Separating FACT from FICTION: Hints and Tips to Improving QUERY Performance
Have you ever experienced the "run-away" query? Do you try to run queries against production database files on your IBM i or AS/400, only to have all CPU resources eaten up by your job? Are you frustrated with building and maintaining extracts or expensive replication processes to download to PC databases for reporting purposes? Does use of ODBC imply poor performance?
If any of those are symptoms of your world, then the following may be of interest to you! This document is intended to provide an educational overview and set of recommendations to address the above issues - and separate FACT from FICTION. One of the first facts is that DB2 for i contains a sophisticated, high performance query engine. This query engine, including the "optimizer," handles all query-type requests. Query type requests come from programs such as Query, Query Manager, PC based tools using an ODBC or JDBC driver, programs issuing OPNQRYF (Open Query File) commands, or any other application using SQL as its mechanism to access data stored in your IBM i or AS/400 database.
Several factors impact query performance, but with a little education and pointers to additional resources, we're hopeful you can be on your way to SCREAMING query performance!
Fact #1: ODBC is not slow.
If you are having query performance issues from PC based programs using ODBC the bottlenecks may actually have little to do with ODBC. Many factors affect performance of query/SQL operations. The association of ODBC drivers with poor performance comes from the implementation of ODBC on other database platforms. But the IBM Client Access Express ODBC driver is a native interface to DB2 for i and doesn't suffer from the same "layered" approach that other ODBC implementations suffer from.
Recommendation: Ensure you are using the IBM Client Access Express ODBC driver and begin assessing other elements of query performance documented in the following pages!
Fact #2: Squeezing blood out of a turnip is a tough way to go - know where your query jobs are running!
Many query jobs are suffocating from lack of resources allocated to them. If a query job, or any job for that matter, is not given enough resources to run effectively, then it can become a "thrasher." Once there, that type of job not only impacts the performance of the query, but perhaps by over staying its welcome it can eat up CPU and I/O resources that negatively affect other jobs on the system.
A fundamental knowledge of OS/400 Work Management concepts is an advantage in addressing this issue. ODBC jobs, run by default in a subsystem called QSERVER at V4R5, and QUSRWRK at V5R1. Be wary of two key Work Management conditions that determine how resources are allocated to these jobs. Specifically, QSERVER or QUSRWRK may be running with a limited amount of memory available to it, and secondly, the "activity level" setting on that memory pool may be too high. Those conditions, when combined, are a red flag for poor query performance!
To use an example, suppose ACTIVITY LEVEL is set to 10. This would normally be set to allow for a maximum of 10 jobs to run in this memory pool at any particular time. But the optimizer actually uses a "fair share" algorithm and will try to use only 1/10th, or 10% of the memory in that pool for that particular query task. The result is that the system has too much paging activity and the query job thrashes, or the "optimizer" will build a less than optimal access plan.
Recommendation: Ensure your query jobs are running in a subsystem with sufficient memory allocation and low ACTIVITY LEVEL settings. Performance experts recommend a minimum of 2GB of memory for every CPU as a general rule of thumb for systems running heavy query/SQL workloads.
Fact #3: Don't underestimate the importance of INDEXING
Indexing is critical to good performance in a query/SQL environment. Indexing is used by the optimizer to build the most cost-effective method of accessing the data in order to return the result set in the least amount of time. Even if the optimizer doesn't use a specific index to execute a query, vital statistics are stored in all indexes that can assist the optimizer in understanding elements about the database, such as cardinality (number of unique values) or distribution of values within the database, that can significantly help the optimizer make better decisions.
Without indexes, not only are you potentially omitting key access path information that the optimizer could leverage (thereby minimizing unnecessary I/O), but vital statistics will be unavailable, leaving the optimizer with much less information to work with.
Recommendation: Learn more about indexing and how it affects query performance. For detailed information on indexing strategies, refer to the white paper developed by the .
Fact #4: Parallelism and Encoded Vector Indexing can make a BIG difference!
If you have a multiple processor IBM i or AS/400, you may not be leveraging all of those available CPU cycles unless you have the DB2 Symmetric Multiprocessing (SMP) feature of OS/400. This feature (recently re-priced at V5R1), enables query/SQL jobs to be broken up into multiple tasks. These tasks can then run in parallel across multiple CPUs. Using more available resources to process your query request can significantly improve performance.
Recommendation: Install the SMP feature. You can use GO LICPGM to check to see if you have this feature installed. To set usage parameters, refer to QQRYDEGREE system value, CHGQRYA CL Command, or Operations Navigator functions.
In addition, a new indexing technology was introduced to DB2 for i at V4R3. Encoded Vector Indexing (EVI) adds another access path option for the optimizer to use to achieve better performance. In real-world query environments EVIs have shown impressive results while utilizing much less space than traditional binary radix tree indexes. For more information about creating and using EVIs, refer to the .
Recommendation: Develop an indexing strategy that includes usage of Encoded Vector Indexes.
Fact #5: Replicating data across disparate databases/architectures is HARD
Many customers choose to pull data out of their production databases and create reporting repositories. Terms like Operational Data Store (ODS), Data Warehousing and Data Marts relate to different strategies for these reporting repositories.
But in order to create these repositories extraction and data replication processes are required. Moving data between different system architectures is not easy. Different operating systems mean multiple skills to support varying utilities and operating environments (security, communications, data loads). Different database management systems mean duplication of SQL optimization knowledge, database administration functions, and requirements to transform data due to different data type and SQL implementations. Extraction and replication tools that support heterogeneous environments can be complex and expensive.
The bottom line is if you are creating reporting repositories because of environmental factors (e.g., maintaining security of production data, optimizing data for querying, minimizing impact of query workloads of production systems), it is much easier and less costly to leverage common architectures for both the production environment and the reporting environment.
With DB2 for i there are some fundamental approaches that can make your life easier. With V4R2, IBM introduced Remote Journaling. With Remote Journaling, you can capture changed production data and have the system route the changed data logs (called journal receivers) to another IBM i partition or system. By using Remote Journaling, you can virtually eliminate any concerns about impact of queries against the production system.
To read the database changes from the journal receivers, you could use a low cost utility called Data Propagator to add the changed records to the reporting repository (on this 2nd partition/system). This gives you the ability to use all the techniques noted above to optimize your query environment without impacting the production system.
A side benefit of this is the ability to use the 2nd partition/system for other purposes as well, such as backing up the databases regularly without impacting production, or creation of a 24x7 High Availability scenario.
Recommendation: Assess the effort required to manage movement of data between different database systems for reporting purposes. Consider the impact of supporting multiple operating system and database systems, cost of data movement tools and/or cost of maintenance of custom programs. Consider leveraging Remote Journaling and Data Propagator to simplify the environment while minimizing impact to the production system(s).
Fact #6: DB2 OLAP will WOW your query users
When it comes to creation of a reporting repository, using a utility to help you design data models, build extraction rules/programs, create the necessary load programs, and schedule the whole process for you can eliminate much of the custom work and on-going maintenance required.
DB2 OLAP is such a utility. The database that it creates for you (based on modeling your business dimensions through a GUI interface) is a high performance reporting repository stored in either DB2 relational format or a patented structure that resides in the IFS (Integrated File System) on the IBM i server. Once built, the database can be accessed through over thirty different end user interfaces (including spreadsheets or highly intuitive graphical interfaces) with sub-second response times and minimal CPU impact to the server.
Fact #7: IBM i systems are NOT too expensive for query reporting
A common perception is that an additional partition or IBM i system for reporting purposes is too expensive, compared with other platforms, such as WinTel. However, several factors need to be articulated to tell the real story.
DB2 for i contains a very robust query/SQL engine. But as with any technology, a little "know-how" can go a long way to ensuring you are letting that engine run at its fullest potential. Don't let query blues get you down. If you need assistance, many options are available to assess your environment and provide recommendations.
The IBM IBM i SQL & Query Performance Workshop (IBM Course #S6140) provides detailed design techniques and tips for optimal IBM i data access as well as in depth looks at the tools available for identifying and tuning performance problems.
Performance experts that specialize in analyzing query/SQL performance provide services to assist customers in gathering performance data and analyzing the results. One customer who utilized these experts said, "this is the best money I have ever spent with IBM!"
Contact your local IBM Marketing Representative or Business Partner or visit the DB2 for i home page.
Please send any feedback regarding this document to: firstname.lastname@example.org.