One aspect of the IBM i (and its’ predecessors: AS/400, iSeries, and System i) that users have come to take for granted is that it doesn't require a traditional database administrator (DBA). Most other relational database systems requires at least one dedicated person who spends most of their time performing complex, low-level system tasks (e.g., allocating disk space for database objects, partitioning and balancing disk drives for optimal performance, re-balancing system index structures). This person must also run utilities to check the integrity of the database and keep database statistics up to date for the query optimizer. The IBM i system performs most of these types of functions automatically.
In recent years, however, IBM i database features and functions have evolved at a dizzying pace. The increased functionality of DB2 for i requires a reconsideration of historical assumptions about the database. Although it's true that the IBM i doesn't require a traditional DBA, its new database capabilities present IBM i shops with a need for some type of dedicated database position Maybe this person is called a Database Analyst or Database Engineer to differentiate from a traditional DBA, but regardless of the title there is a need for this position in IBM i shops that are heavily utilizing SQL.
The Good Ol' Days
Traditionally, an application programmer played the DBA's role in many IBM i shops. When simple data models were built into an application's design, programmers could easily create database objects as they implemented their applications (instead of passing off requests to a DBA). Due to the IBM i database's ease of use, scalability, and basic set of features, one person could easily keep track of all the issues and new enhancements associated with the AS/400 database and application development.
All this changed way back in V3R1, when IBM re-branded the database as DB2 for AS/400 and introduced database constraints (e.g., referential integrity), triggers, and stored procedures. Once implemented, some of these new features — especially triggers and constraints — could have a system-wide impact. Database triggers, for example, give you flexibility for solving business problems by letting you automatically have a program called every time a program or user inserts, deletes, or updates a row in one of your database files. But who determines whether you can afford the overhead of an external call to that trigger program every time the database file is touched? If some overnight database batch jobs can't afford this extra overhead, who will remove the triggers before the batch runs and reestablish them after the batch runs finish?
Many IBM i programmers can understand and keep track of new database features, but can they do so and still keep up with IBM i application development tools and technologies? Since V3R1, when IBM introduced ILE to help programmers better modularize their applications, IBM i application developers have had more choices than simply RPG, COBOL, and DDS. Today, a typical developer must Web-enable existing solutions, investigate newer IBM i technologies such as PHP, Java, and XML, and keep up with the latest tools and methodologies. With so much to do, few IBM i programmers have time to master DB2 for i. As the database becomes more independent of applications, IT shops must concentrate database knowledge in one or two people instead of scattering it across a host of application developers and system administrators.
The DB2 for i DBA skill set
As a DBA, your required database skills vary depending on the types of solutions you use. But, in general, an IBM i DBA needs some mix of the following abilities:
I explain these skills from an SQL perspective because SQL is the industry standard for database access and the IBM i’s strategic interface for DB2. However, most of my points also apply to the IBM i's native database interface.
- Logical and physical data modeling and database design
Logical and physical data modeling and database design means creating an efficient physical database design from a logical database model and application specifications. Before you code the database definitions, you must design the database properly so applications and solutions can operate on it. Usually, you start by forming a logical data model that covers the entire enterprise. Then, you use this model to determine where new business data fits and to help modelers design a database where different applications can work together. A fundamental part of the data model is the identification of domains (data types); but in the logical model, domains shouldn't be concerned with physical storage properties such as length. Instead, range, precision, and units are more appropriate.
After you normalize the logical database design to tighten it and eliminate redundant data, you must transform it into the physical database design. Here, a DBA must be familiar with how data will be accessed and understand the architecture of the database on which the data will reside. Without getting down to bits and bytes, it's important to understand (at least conceptually) how different data types are implemented on the DB2 or i. For example, the implementation of variable-length character columns (e.g., VARCHAR) varies widely across databases, and it has a direct impact on storage and performance. Many new IBM i application providers use VARCHAR columns widely on other databases, but to achieve real space savings with VARCHAR columns in DB2 for i, you should use them only for long descriptions and memo fields. Using VARCHAR for every character-based column adds unnecessary overhead to your database.
A DBA with data-modeling expertise can improve the basis (i.e., the schema) upon which all applications ultimately rest. Application developers can then simply be "consumers" of subschemas for the data they need to store and access.
Graphical data modeling tools can make the modeling process more efficient. The IBM Infosphere Data Architect product supports DB2 for i. In addition, DB2 for i is also supported by several third-party modeling tools such as XCase for i.
- Database creation, maintenance, and management
Database creation, maintenance, and management means transforming the physical database design into the physical database objects and then managing those objects. As I mentioned earlier, IBM i DBAs don't need to perform many mundane, low-level tasks because DB2 and the operating systems takes care of them automatically. However, someone must manage the database definition source and verify that all the database objects are created successfully with the proper attributes.
Database objects are no longer limited to tables and indexes; many applications now rely on other database objects such as triggers, constraints, stored procedures, sequences, and user-defined functions. All these objects are related, so there are dependencies on the order in which you create them (e.g., you need to create a table before creating a constraint or trigger for it) and the library (or collection) where you create them. You must keep these dependencies and relationships intact as you move applications from development into production. For example, does your production-level stored procedure access the production-level database tables, or is it still working (erroneously) on the development version? Someone must control and verify this process.
Because application designs and requirements constantly change, you should dedicate someone to managing changes to database objects after they're created. This maintenance might involve adding, removing, or changing a column for a table and then determining which application programs (if any) must be changed to accommodate the database change. You also need someone to verify that related database objects such as triggers, constraints, and indexes are still intact after database modifications.
- Metadata management
Metadata management means collecting, storing, managing, and querying data about data in the database. The core metadata for DB2 is stored in the system-maintained database catalogs, which keep information about all database objects on the IBM i. This metadata is essential for helping someone manage dependencies between database objects by running reports to determine what constraints or indexes are defined over a table. You can also reference the catalogs to determine the data type or length for a particular column. A data warehouse environment requires even more metadata management because information about business data is key to making data easier for end users to find and understand.
- Database performance monitoring, tuning, and management
Database performance monitoring, tuning, and management means monitoring database performance and improving it by making changes to the database, application logic, or system configuration. To manage performance, a DB2 for i DBA must understand the database engine and query optimizer and be a skilled user of the available tools.
The database engine's internal workings and nuances are critical to performance. For example, creating open data paths (ODPs) with DB2 for i is very expensive, so an IBM i DBA must constantly look out for non-reusable ODPs that result in the unnecessary and repetitive creation of an ODP. There are also a wide variety of application and systems settings that influence the behavior of DB2 for i query optimizer. DB2 for i does a pretty good job of optimization, but the more you learn about the query optimizer, the better you can tune performance. A DBA must understand all the access methods (e.g., full table scan) the query optimizer can employ and the factors (e.g., memory size, data skew, parallel processing degree) that cause the optimizer to choose one method over another.
IBM i Navigator (formerly known as System i Navigator) provides several useful graphical performance tools. The SQL Performance Monitor and SQL Plan Cache tools are two tools that make it easy to manage and analyze DB2 for i performance. More advanced database performance tools are avaiable from third parties such as the Centerfield Technology Homerun suite. All these tools provide information about how a database request was implemented and why, including information about temporary indexes and join order.
The DB2 for i SQL Performance Workshop is the best resource available for deepening your knowledge of performance tuning and helping you become proficient with the DB2 for i performance tools.
- Indexing strategy and management
Indexing strategy and management means creating and managing indexes for high performance. If your database doesn't have the proper indexes created, query performance can suffer because the DB2 for i query optimizer is dependent on indexes providing access methods and statistics. Indexes can, for example, tell the optimizer how many distinct values a column contains (i.e., data skew or cardinality) and approximately how many rows a certain search value will return. You want the query optimizer to have enough information to make intelligent decisions about query implementations on your system.
Due to the resources needed to maintain indexes, creating too many of them can cause performance problems. Index maintenance slows down inserts and updates for an index's underlying table, so you must consider maintenance each time you consider creating an index (rather than using an existing one) for query tuning. A DBA must control and track index creation and know when and how to use parallel processing when creating indexes over large tables or maintaining indexes during bulk data loads.
Furthermore, a DBA may now choose between multiple types of database indexing technology. DB2 for i provides encoded vector indexes (EVIs) to complement the traditional DB2 for i indexing technology, which is based on a binary radix tree. EVIs can improve query and report performance, so you should create EVIs over columns with a relatively low number of distinct values. However, EVIs are meant to be complementary; it's not a good idea to simply replace all your existing indexes (and keyed logical files) with EVIs. Instead, a DBA who understands how the query optimizer can use EVIs should determine when to use them rather than traditional IBM i indexes.
- SQL code reviews
SQL code reviews means reviewing SQL access and host-language programs and recommending changes for optimization before those programs go live. By tuning poorly written SQL immediately, a DBA can, for example, identify application code that will cause ODPs to be non-reusable or queries that may require new indexes to be created. SQL code reviews are proactive performance tuning; they let you transfer a DBA's knowledge of the database engine and optimizer into the development process to nip performance problems in the bud.
- Database security and integrity
Database security and integrity means designing and implementing a database so that only accurate data is added and only authorized users have access to the data. The DBA should work with the security officer or administrator to develop a security architecture. Then, the security administrator grants and revokes authorities and performs other tasks to control access to database objects.
The increasing popularity of interfaces such as JDBC, ADO.NET and ODBC, which make it easy for PC clients to access your AS/400 databases, underscores the need for object-level database security. Someone in your shop should be dedicated to managing the security requirements of new database objects such as triggers, stored procedures, and user-defined functions. Although many legacy IBM i applications use menu-based controls (instead of object security) to control database access, the DBA must specify stored procedures (and, where necessary, appropriate adopted authority) to implement secure access not definable by views or column-level security.
Once data is secure, someone must keep it accurate and ensure that new data being added is valid. With commitment control, you can determine whether users can see business transactions in process or completed transactions only. For example, if you transfer $100 from savings to checking, your bank's update program may use commitment control levels to lock the $100 until the transaction is completed. If the bank loses power before your checking account is credited, commitment control could automatically "roll back" your $100 savings withdrawal. This is a simple example, but be aware that a DBA must often determine a level of commitment control that satisfies both the concurrent data-access and performance requirements of an application.
Another database feature you can use to improve your data's integrity is database constraints, which let you integrate some of your business rules (e.g., each order must have a unique order number) and enforce them on every system interface that can change your data. Application-enforced integrity, by contrast, is bypassed any time that the application isn't used to change the data (e.g., STRDFU, ODBC). DB2 for i offers four types of constraints: primary key, unique, referential integrity, and check constraints. The DBA must implement the constraints, clean up any data-integrity issues they detect, and ensure that they're always in a valid state.
Many IBM i databases were designed to be updated by a single set of application programs. For such databases, the application performed all integrity checks (e.g., Is a state abbreviation valid?). As systems evolve, however, databases are being accessed through many new mechanisms: ODBC, Java programs, Web pages, and more. Thus, it's important to migrate as many RPG integrity checks as possible into the database itself via constraints and referential integrity. Note that in cases where multiple applications can update the same data, all the applications must explicitly enforce integrity checks - a high-maintenance, error-prone approach that still may not cover all interfaces.
- Database backup and recovery
Database backup and recovery means implementing a database backup and recovery strategy based on data volatility and application availability requirements to protect your business from unexpected hardware and software failures. The AS/400 database has extensive backup and recovery facilities, including object-level backups, online backups, journaling (logging), and point-in-time recovery. The DBA must design a process that uses the right combination of tools to back up and recover your database.
Businesses that can live with a couple of hours of downtime can simply use the normal save/restore process. But companies with mission-critical applications require a clustered, high-availability solution that relies on journaling of database objects. Once the design is in place, the DBA helps implement and maintain the backup process, ensuring that all database objects are backed up. This verification process used to be straightforward because you only had to be concerned about the tables (physical and logical files), indexes (access paths), journals, and journal receivers. But now, a DBA must also make sure database objects such as triggers and stored procedures are backed up.
Newer DB2 for i technologies such as the integrated XML support and the IBM OmniFind Text Search Server make use of files in the IBM i integrated file system (IFS) which complicates backup and recovery. When journaling is needed for a database object, the DBA must verify that the object is being journaled and manage the associated journal receivers. The DBA also must periodically test the database recovery process to verify that the proper controls and backups are in place.
- General systems management and networking skills
General systems management and networking skills means controlling and configuring system resources that affect database access and database performance. Usually, the system administrator is responsible for systems and network management configurations. The DBA, however, must understand how these configurations can affect the database engine and be able to discuss these issues with the system administrator.
Memory pools, for example, affect database performance directly, so a DBA must be knowledgeable about IBM i work management to set up these pools properly. One IBM i client tried to solve its ODBC performance problems by adding 4 GB of memory. This additional memory should have paid performance dividends immediately, but because the system manager didn't understand IBM i work management, he allocated the new memory to the wrong pool.
Database interfaces such as Distributed Relational Database Architecture (DRDA) and ADO.NET require a DBA who can perform basic networking tasks. What if a user's PC database access stops functioning? The DBA must at least be able to check the state of the database-related host communication servers, and they might be responsible for starting those servers. In addition, these new interfaces require configuration of data sources and relational database directory entries to function. Some configuration options tie directly to database performance, so a DBA is the right person to handle these tasks.
Finding the right mix
It's easy to see that a wide variety of DB2 for i interfaces and functions require attention. Yes, companies that have an IBM i running their business from a backroom can still say, "No DBA required." Such companies don't need to hire someone to make sure their database objects have enough storage or run tools to check the integrity of database structures.
However, IBM i shops that are developing their own SQL-based applications, deploying a data warehouse or business intelligence solution, or closely supporting a data-centric packaged solution (e.g., an ERP solution) must address the DBA role in some way. Each company's mix of applications and data is unique, so each company will require a different assortment of the skills that are outlined here. IBM offers training and services from DB2 for i Center of Excellence to help your team grow its’ DBA skillset. Having a DBA with the right mix of these skills for your organization is the key to getting a maximum return from your DB2 for i databases.
This information first appeared in the May 2000 issue of NEWS/400.
Kent Milligan works in the DB2 for i Center of Excellence as a senior consultant focusing on ISV enablement for the IBM i platform. After graduating from the University of Iowa in 1989, Kent spent the first eight years of his IBM career as a member of the DB2 development team in Rochester. He speaks and writes regularly on various DB2 for i relational database topics.