RMF PM has two formats for presenting performance data:
- Single-Value Metrics, for example
- % utilization (of a processor, of a channel, ...)
- i/o activity rate (of a logical control unit, ...)
- Value-List Metrics, for example
- % utilization by job
- # delayed jobs for i/o by mvs image
The unique indicator in the name of a Value-List Metric is the keyword by.
% channel path partition utilization
The channel path utilization percentage for an individual logical partition. RMF uses the values provided by CPMF (Channel Path Measurement Facility).
In LPAR mode, the calculation is:
% partition utilization = (CBT / CET) * 100
- CBT
- Cumulative channel path busy time
- CET
- Cumulative channel path elapsed time
In BASIC mode, blanks are shown.
% channel path total utilization
The channel path utilization percentage for the entire system during an interval.
For shared channels in LPAR mode, or for all channels in BASIC mode with CPMF not available, the calculation is:
% total utilization = (SCB / N) * 100
- SCB
- Number of SRM observations of channel path busy
- N
- Number of SRM samples
For unshared channels in LPAR mode, the value for total utilization is the same as partition utilization.
For all channels in BASIC mode with CPMF available, the calculation is:
% total utilization = (CBT / CET) * 100
- CBT
- Cumulative channel path busy time
- CET
- Cumulative channel path elapsed time
% enqueue delay
The percentage of time during the report interval that the system or job was waiting to use a serially reusable resource that another system or job was using.
% HSM delay
The percentage of time during the report interval that the system or job was waiting for services from the Hierarchical Storage Manager (HSM).
A high HSM delay value might be caused by one or more of the following:
- HSM address spaces delayed (Check HSM address spaces on the Job report)
- Delay on HSM volumes (Check HSM device volumes on the DEVR report)
- HSM doing its housekeeping during prime time
- Not enough primary or level one space
- HSM dispatching priority too low.
% JES delay
The percentage of time during the report interval that the system or job was waiting for services from the Job Entry Subsystem (JES).
A high JES delay value might be caused by one or more of the following:
- JES address spaces delayed (Check JES address spaces on the Job report)
- Delay on JES volumes (Check JES device volumes on the DEVR report)
- JES dispatching priority too low.
% operator delay
The percentage of time during the report interval that the system or job was waiting for the operator to reply to a message or mount a tape, or the address space was quiesced by the operator.
% processor delay
The percentage of time during the report interval that the system or job or enclave was waiting for a processor.
A high processor using value might be caused by one or more of the following:
- looping user
- high dispatching priority for a processor-bound job (in compatibility mode) or high importance for the service class of a processor-bound job (in goal mode)
- small block size I/O
- excessive use of expensive supervisor service
A high processor delay value might be caused by one or more of the following:
- ineffective choice of dispatching priorities in either the SRM IPS (compatibility mode) or ineffective choice of importances in the active service policy (goal mode)
- high priority work using an excessive amount of CPU
- ineffective mean-time-to-wait usage
% storage delay
The percentage of time during the report interval that the system or job was waiting for a COMM, LOCL (both include shared pages), SWAP, or VIO page, was on the out/ready queue, or was a result of a cross-memory address space or standard hiperspace paging delay.
For enclaves, only COMM, cross-memory, and shared page delays apply.
A high storage delay value can be associated with common storage paging (COMM), local storage paging (LOCL), swap-in delay (SWAP), swapped out and ready delay (OUTR), and other delays (OTHR) which includes virtual I/O paging and paging delays from cross-memory address spaces and standard hiperspaces.
A high storage delay associated with common storage paging might be caused by one or more of the following:
- insufficient page data sets
- not enough central storage
- poorly tuned paging configuration
- too many address spaces in storage
- too many "logical swap" address spaces in storage
- excessive storage isolation of address spaces
- too many extremely large address spaces resident
- paging data set on shared device
- high use of user I/O on paging volume
- "common I/O" contends with "swap I/O"
- common data set on wrong device
A high storage delay associated with local storage paging might be caused by one or more of the following:
- insufficient page data sets
- not enough central storage
- address space is under isolated (causing trim) or over isolated (causing others to page/swap)
- poorly tuned paging configuration
- too many address spaces in storage
- too few (artificially low) address spaces in storage
- too many "logical swap" address spaces in storage
- paging data set on shared device
- high use of user I/O on paging volume
- too much swapping
- page-ins are from trimming at swap-out
- "local I/O" contends with "swap I/O"
- program pages in each address space rather than in PLPA
- too many extremely large address spaces resident
A high storage delay associated with virtual I/O might be caused by one or more of the following:
- insufficient page data sets
- poorly tuned paging configuration
- paging data set on shared device
- high use of user I/O on paging volume
- virtual I/O contending with swap I/O
A high storage delay associated with swap-in activity might be caused by one or more of the following:
- too much swapping
- workload too heavy
- insufficient page/swap data sets
- misplaced page/swap data sets
- swap data sets on slow devices
- too few (artificially low) address spaces in storage
- paging data set on shared device
- high use of user I/O on paging volume
- swapped pages moved to backing store on cached device
- not enough central storage
A high delay value for address spaces that are swapped out and ready might be caused by one or more of the following:
- too few (artificially low) address spaces in storage
- workload too heavy
- unbalanced workload
- not enough central storage
- poorly tuned paging configuration
- insufficient page/swap data sets
- too many address spaces in storage
- too many or too few logical swap address spaces
- paging/swapping too slow
- exchange swap rate too high
- too many detected wait swaps
- improper use of storage isolation
Other storage delays might be caused by one or more of the following:
- paging delays from cross-memory address spaces
- paging delays from standard hiperspaces (but not ESO hiperspaces)
% subsystem delay
The percentage of time during the report interval that the system or job was waiting for services from
% XCF delay
The percentage of time during the report interval that the system or job was waiting for services from the Cross-System Coupling Facility (XCF).
A high XCF delay value might be caused by one or more of the following:
- Path capacity exceeded.
- Other applications are tying up the path.
- XCF delays on the receiving system.
- Some data paths are unavailable or offline.
% total delay
The percentage of time during the report interval that the job was not using any resources and was delayed for at least one of the following resources:
Note: If a job with several tasks is simultaneously delayed for more than one resource, RMF counts this job only once as delayed when it calculates delay percentage.
% idle
The percentage of time during the report interval that the system or job was idle.
RMF considers a job idle if it is in terminal wait, timer wait, or is waiting to be selected by JES, and it is not using or waiting for any resource that RMF monitors.
% using
The percentage of time during the report interval that the system or job was using one or more processors or devices.
Note: If a job with more than one task is simultaneously using and delayed for the same resource, RMF counts the job once as using and once as delayed (regardless of how many times it is found using and delayed). If a job is delayed for more than one resource, it is counted once for the overall delay and once for each resource causing a delay.
% workflow
Workflow percentage is the speed at which a job is moving through the system in relation to the maximum speed at which it could move through the system.
A low workflow percentage indicates that the job has few of the resources it needs and is contending with other jobs for system resources. A high workflow percentage indicates that the job has the resources it needs to execute and is moving through the system at a relatively high speed.
For example, a job that could execute in one minute if all the resources it needed were available, would have a workflow of 25 percent if it took four minutes to execute.
% unknown
RMF considers the system or jobs that are not delayed for a monitored resource, not using a monitored resource, or not in a monitored idle state to be in an unknown state.
The value represents the percentage of time during the report interval that the job was in the system, but not in any monitored state.
Examples of address spaces in an unknown state include those waiting for devices other than DASD or tape and those that are waiting for work (idle) using a method that RMF does not recognize. Started tasks (STCs) are usually found in this category.
% connect time
The sum of the percentages of time during the report interval that devices used by the job were connected to channel path(s) to transfer data between the devices and central storage.
Because a job can be connected to more than one device at a time, the value in connect time percentage can be greater than 100 %.
Note: This can include devices other than DASD and tape; for example, graphic displays.
% using
The percentage of time during the report interval that one job or all jobs in a group or in the system were using one or more devices.
RMF considers a job to be using a device as soon as the job's I/O request is queued in the channel for the device. Therefore, the using percentage for a device includes both active time on the device and queuing delay in the channel.
i/o activity rate
The rate per second that I/O instructions (SSCH, RSCH, and HSCH) to a device completed successfully.
IOS queue time
The average number of milliseconds an I/O request must wait on an IOS queue before an SSCH instruction can be issued. A delay occurs when a previous request to the same subchannel is in progress.
response time
The average response time (in milliseconds) that the device required to complete an I/O request.
i/o intensity
The product of the number of users and the time waiting in average for a DASD device because of one of the following reasons:
- The path and device are busy
- The SIO is pending
- The device is busy
- The SIO is queued
There is no common name for I/O intensity in the literature. Other programs might use different names. The following terms are equivalent to I/O Intensity: DASD MPL, Response Time Volume.
% active time
The percentage of time during the report interval that the device was active.
active time = connect time + disconnect time + pending time
% connect time
The percentage of time during the report interval that the device was connected to a channel path.
% disconnect time
The percentage of time during the report interval that the device had an active channel program, but was not connected to the channel.
Disconnect time includes seek time, normal rotational delay time, and extra rotational delay time because the channel was busy.
% pending time
The percentage of time during the range period that I/O requests were waiting in a channel queue before a path was available.
Pending time includes the time spent waiting for a device, a control unit, a head of string, or a channel.
% I/O delay
The percentage of time during the report interval that the job is waiting for any DASD or tape, or has an I/O request queued in the channel for a device, but not transmitting data (for example, is being disconnected to seek).
A high device delay value for a job usually means that another job has a high using value for the same device. Use the Device Delay report to determine what volume a job is waiting for; then use the Device Resource Delay report to determine how the job using that volume is spending its time.
General reasons for a high device using value might include:
- unnecessary I/O (such as using DASD instead of VIO for temporary data sets)
- data sets on a slow device
Using time for a volume will approximately equal connect time (time that the device was connected to a channel path). Using time does not include disconnect time (time that the device had an active channel program but was not connected to the channel) and pending time (time that I/O requests were waiting in a channel queue before a path was available).
A high connect percentage (CON %) might be caused by one or more of the following:
- programs not resident
- inappropriate application parameters
- inefficient use of device by application(s)
- not enough in-storage buffering
- heavy BLDL activity
- high VTOC activity
A high disconnect percentage (DSC %) might be caused by one or more of the following:
- small block size I/O
- multiple revolutions per I/O due to missing channel connects or reconnects
- long seeks because of data set placement or multiple extents on high use data sets
- heavy BLDL activity
- high miss ratio for cached device
- misplaced VTOC or CATALOG or both
- channel, control unit, or head of string contention
A high pending percentage (PND %) might be caused by one or more of the following:
- shared DASD contention
- device not responding
- channel, control unit, or head of string contention
- poorly balanced I/O
- PND time of 100 % usually means another system had the device reserved
% delay device busy
The percentage of time during the range period when there was an I/O request delay because the device was busy. Device busy might mean that another system is using the volume, another system reserved the volume, or a head of string busy condition caused the contention.
% control unit busy
The percentage of time during the range period when there is an I/O request delay because the control unit was busy. If the device is shared at the control unit level, a sharing system might be using the device. If the device is not shared at the control unit level, the contention is the result of other activity to different devices over an alternate path serviced by this control unit.
% director port busy
The percentage of time during the range period when there is an I/O request delay because the ES Connection Director port was busy.
% using
The percentage of time during the report interval that the job was using the volume.
RMF considers a job to be using a device as soon as the job's I/O request is queued in the channel for the device. Therefore, the using percentage for a device includes both active time on the device and queuing delay in the channel.
% all channel paths busy
The percentage of time during the measurement interval when all channel paths belonging to the LCU were busy at the same time.
Only channel paths that are both online to the system and connected to a device are included in the calculation:
% all channel paths busy = CHPID0 * CHPID1 * CHPID2 * CHPID3
where CHPIDn = Percentage busy of each channel path involved
% control unit busy
This value shows for each channel path of the LCU the relationship between requests deferred due to control unit busy and total successful requests serviced by that path.
Each CHPID of the LCU measures the distribution of control unit contention.
The calculation is:
% control unit busy = ((CUB / (DPB + CUB + SUC)) * 100
- DPB = Number of deferred I/O requests due to director port busy
- CUB = Number of deferred I/O requests due to control unit busy
- SUC = Number of successful I/O requests on that path
% director port busy
This field indicates director port contention .
It is the number of times an I/O request was deferred because the director port was busy during the measurement interval.
The calculation is:
% director port busy = ((DPB / (DPB + CUB + SUC)) * 100
- DPB = Number of deferred I/O requests due to director port busy
- CUB = Number of deferred I/O requests due to control unit busy
- SUC = Number of successful I/O requests on that path
% CHPID taken
The rate at which I/O requests to devices of this LCU are satisfied by each CHPID during the interval.
By reviewing the rate at which each channel path of the LCU satisfies I/O requests, you can see how evenly the work requests are distributed among the available paths and how effectively those paths are arranged for the LCU.
The calculation is:
% CHPID taken = (TO / SI) * 100
- TO
- Total number of I/O operations accepted on that path
- SI
- Number of seconds in the interval
# delayed i/o requests
The average number of delayed requests on the control unit header (CU-HDR).
Each time a request is enqueued from the CU-HDR, RMF counts the number of requests remaining on the queue and adds that number to the accumulator.
The calculation is:
# delayed i/o requests = (AL - ER) / ER
- AL
- Accumulated queue length
- ER
- Total number of enqueued requests
delayed i/o request rate
The rate per second at which the IOP places delayed I/O requests on the CU-HDR for this LCU. This is done when all paths to the subchannel are busy and at least one path to the control unit is busy.
For devices with only one path, or for devices where multiple paths exist and the busy condition is immediately resolved, the IOP does not count the condition.
The calculation is:
delayed i/o request rate = ER / SI
- ER
- Total number of enqueued requests
- SI
- Number of seconds in the interval
% delay by volume
The percentage of delay caused because the job was waiting to use the named volume.
% using
The percentage of time during the report interval that one job or all jobs in a group or in the system were using one or more processors.
% TCB+SRB
The percentage of total processor time used by the job during the report interval.
working set
The working set represents the (central or expanded) storage the user has when a job is actually running. Shared page counts are not included in the working set.
% delay for SWAP
The percentage that swap-in delays contributed to the delay of a job.
% delay for COMM
The percentage that common storage (common service area (CSA) or link pack area (LPA)), including shared pages, contributed to the delay of a job.
% delay for LOCL
The percentage that local (private) storage paging, including shared pages contributed to the delay of a job.
% delay for OTHR
The percentage that various types of delays contributed to the delay of a job.
This is the sum of:
- VIO (virtual I/O)
- Paging delays from cross-memory address spaces
For example, if the DB2 address space does not have sufficient central/expanded storage, CICS could be delayed by cross-memory page-in in the DB2 address space. This would show up as a cross-memory delay for CICS.
- Paging delays from standard hiperspaces (but not ESO hiperspaces)
This delay could be caused by a job running DFSORT with hipersorting if the DFSORT hiperspace's pages were migrated from expanded to auxiliary storage.
% delay for OUTR
The percentage that swapped-out-and-ready delays contributed to the delay of a job.
% available
The percentage of common storage (CSA, ECSA, SQA, or ESQA) available for allocation at the end of the specified range period.
% not released
The percentage of allocated common storage (CSA, ECSA, SQA, or ESQA) that was not released when a job ended.
% utilization
The percentage of common storage (CSA, ECSA, SQA, or ESQA) used during the specified range period.
# frames not released
The amount of allocated common storage (CSA, ECSA, SQA, or ESQA) that was not released when a job ended.
# frames used
The amount of common storage (CSA, ECSA, SQA, or ESQA) used during the specified range period.
# frames defined
The amount of common storage (CSA, ECSA, SQA, or ESQA) defined to the system at IPL.
# frames idle
The average number of frames held by a job while it was idle.
# frames total
The sum of the active and idle frames.
Note: The shared page counts are not included in this value.
# frames active
The average number of frames held by a job while it was active.
# frames fixed
The average number of fixed frames a job was using during the report interval including frames both above and below the 16 megabyte line.
A fixed frame is a frame that cannot be paged out of central storage.
# frames DIV
The DIV frame count represents the number of Data-in-virtual frames in relation to the number of Data-in-virtual samples.
# slots
The total number of the auxiliary storage slots a job used, averaged over the report interval.
es rate per residency time
The value is the rate of page-moves from expanded storage to central storage per active second. This count is the total page-move count divided by the time the user was swapped-in.
It includes single and blocked pages, but does not include shared, hiperspace or VIO pages.
pgin rate
The rate at which pages are being read into central storage.
It is calculated by dividing the total page-in count (for the group) by the resident time.
The address-space related shared storage page-ins are included in the value.
migration age
Migration age is the average number of seconds a page resides on expanded storage before it migrates to auxiliary storage.
unreferenced interval count
The average high unreferenced interval count (UIC) is an indicator of central storage contention. A high UIC count indicates that storage contention is low and you are not experiencing any storage problems.
% frames active
The percentage of storage allocated to jobs that are active.
% frames available
The percentage of available storage.
% frames idle
The percentage of storage allocated to jobs that are idle.
% frames CSA
The percentage of storage allocated to the common storage area (CSA).
% frames LPA
The percentage of storage allocated to the link pack area (LPA).
% frames NUC
The percentage of storage allocated to the nucleus (NUC).
% frames SQA
The percentage of storage allocated to the system queue area (SQA).
# delayed jobs for COMM
The average number of jobs in each group that are delayed for common storage (common service area (CSA) or link pack area (LPA)), including shared pages.
# delayed jobs
The average number of jobs in each group that are delayed for any of the storage reasons COMM, LOCL, SWAP, OUTR, or OTHR.
# delayed jobs for OTHR
The average number of jobs in each group that are delayed for various types of delays.
This is the sum of:
- VIO (virtual I/O)
- Paging delays from cross-memory address spaces
For example, if the DB2 address space does not have sufficient central/expanded storage, CICS could be delayed by cross-memory page-in in the DB2 address space. This would show up as a cross-memory delay for CICS.
- Paging delays from standard hiperspaces (but not ESO hiperspaces)
This delay could be caused by a job running DFSORT with hipersorting if the DFSORT hiperspace's pages were migrated from expanded to auxiliary storage.
# delayed jobs for OUTR
The average number of jobs in each group with swapped-out-and-ready delays.
# delayed jobs for LOCL
The average number of jobs in each group that are delayed for local (private) storage paging, including shared pages.
# frames online
- Central storage
- Number of central storage frames, excluding read-only frames.
Nucleus frames are included in this Metric.
- Expanded storage
- Number of usable expanded storage frames.
# delayed jobs for SWAP
The average number of jobs in each group with swap-in delays.
pgin rate per residency time
The average number of page-ins per second for an address space.
The calculation is the total number of non-swap page-ins (including VIO page-ins, hiperspace page-ins, page-ins caused by page faults, and shared storage page-ins) during the range period divided by the total time an address space was swapped-in (residency time).
execution velocity
The execution velocity of the MVS system, workload group, service class or service class period being reported on. This value is calculated independent of a specified goal.
The value for execution velocity is calculated as CPU using, divided by the sum of CPU using and total delays gathered by WLM.
A high value indicates little workload contention while a low value indicates that the requests for system resources are delayed.
response time
The average response time (in seconds) for all transactions of a job class (*SYSTEM, *TSO, *BATCH, *STC, *ASCH or *OMVS), a WLM workload, or WLM service or report class that ended during the range period.
The response time value is the sum of the queued time and the active time for an average ended transaction.
transaction rate
The number of transactions per second for a job class (*SYSTEM, *TSO, *BATCH, *STC, *ASCH or *OMVS), a WLM workload, or WLM service or report class during the range period.
% partition utilization
MVS view of CPU utilization.
For example, if an MVS partition has 5% of the processor capacity and the physical CPU utilization reported by RMF for the partition is 5%, this indicates an MVS view of 100% CPU utilization.
This Metric is available in LPAR mode only, because in Basic mode (non-LPAR mode) this value is shown in the % total utilization Metric.
% workflow
The average speed at which the jobs in the group are moving through the system in relation to the maximum speed at which they could move through the system.
A low workflow percentage indicates that jobs in the group have few of the resources they need and are contending with other jobs for system resources. A high workflow percentage indicates that jobs in the group have the resources they need and are moving through the system at a relatively high speed.
For example, jobs in a group that could process in four minutes if all the resources that they needed were available, would have a workflow of 25% if they took sixteen minutes to process.
% average CPU utilization
The average utilization percentage for all processors during the report interval.
# active users
The average number of active users in the system or in a group of address spaces.
Active users include those using a monitored resource, those delayed for a monitored resource, and those doing activities that RMF does not measure.
Each system user is either active, idle or unknown during a report interval.
% SRB
The average percentage of SRB time used by the system.
% TCB
The average percentage of TCB time used by the system.
% TCB+SRB
The average percentage of processor time used by all address spaces per processor.
# users
The average number of total users in the system or in a group of address spaces.
# using jobs
Average number of users using devices.
# using jobs
Average number of users using the processor.
# processor online
The number of processors online during the range period.
% workflow
Workflow percentage with respect to the processor is the speed at which one job or all jobs in a group or in the system are using the processor(s) in relation to the maximum speed at which they could do this.
The calculation for this value is:
%workflow = (%using / (%using + %delay)) * 100
In this formula, the values of %using and %delay refer to the processor.
% workflow
Workflow percentage with respect to devices is the speed at which one job or all jobs in a group or in the system are using the devices in relation to the maximum speed at which they could do this.
The calculation for this value is:
%workflow = (%using / (%using + %delay)) * 100
In this formula, the values of %using and %delay refer to devices.
# using jobs
The average number of jobs using either the processor or devices during the report interval.
# delayed jobs
The average number of jobs that are delayed during the report interval because of at least one of the following reasons:
- Waiting for a processor
- Waiting for a device
- Waiting for storage
- Waiting for a subsystem (JES, HSM, XCF)
- Waiting for the operator
- Waiting for serially reusable resource (enqueue)
# delayed jobs for enqueue
The average number of jobs for each group that are waiting to use a serially reusable resource that another system or job was using.
# delayed jobs for HSM
The average number of jobs for each group that are waiting for services from the Hierarchical Storage Manager (HSM).
A high HSM delay value might be caused by one or more of the following:
- HSM address spaces delayed (Check HSM address spaces on the Job report)
- Delay on HSM volumes (Check HSM device volumes on the DEVR report)
- HSM doing its housekeeping during prime time
- Not enough primary or level one space
- HSM dispatching priority too low.
# delayed jobs for JES
The average number of jobs for each group that are waiting for services from the Job Entry Subsystem (JES).
A high JES delay value might be caused by one or more of the following:
- JES address spaces delayed (Check JES address spaces on the Job report)
- Delay on JES volumes (Check JES device volumes on the DEVR report)
- JES dispatching priority too low.
# delayed jobs for operator
The average number of jobs for each group that are waiting for the operator to reply to a message or mount a tape, or the address space was quiesced by the operator.
# delayed jobs for subsystem
The average number of jobs for each group that are waiting for services from
# delayed jobs for XCF
The average number of jobs for each group that are waiting for services from the Cross-System Coupling Facility (XCF).
A high XCF delay value might be caused by one or more of the following:
- Path capacity exceeded.
- Other applications are tying up the path.
- XCF delays on the receiving system.
- Some data paths are unavailable or offline.
# delayed jobs for I/O
The average number of jobs for each group that are waiting for any DASD or tape, or has an I/O request queued in the channel for a device, but not transmitting data (for example, is being disconnected to seek).
A high device delay value for a job usually means that another job has a high using value for the same device. Use the Device Delay report to determine what volume a job is waiting for; then use the Device Resource Delay report to determine how the job using that volume is spending its time.
General reasons for a high device using value might include:
- unnecessary I/O (such as using DASD instead of VIO for temporary data sets)
- data sets on a slow device
Using time for a volume will approximately equal connect time (time that the device was connected to a channel path). Using time does not include disconnect time (time that the device had an active channel program but was not connected to the channel) and pending time (time that I/O requests were waiting in a channel queue before a path was available).
A high connect percentage (CON %) might be caused by one or more of the following:
- programs not resident
- inappropriate application parameters
- inefficient use of device by application(s)
- not enough in-storage buffering
- heavy BLDL activity
- high VTOC activity
A high disconnect percentage (DSC %) might be caused by one or more of the following:
- small block size I/O
- multiple revolutions per I/O due to missing channel connects or reconnects
- long seeks because of data set placement or multiple extents on high use data sets
- heavy BLDL activity
- high miss ratio for cached device
- misplaced VTOC or CATALOG or both
- channel, control unit, or head of string contention
A high pending percentage (PND %) might be caused by one or more of the following:
- shared DASD contention
- device not responding
- channel, control unit, or head of string contention
- poorly balanced I/O
- PND time of 100 % usually means another system had the device reserved
# delayed jobs for processor
The average number of jobs for each group that are waiting for a processor.
A high processor using value might be caused by one or more of the following:
- looping user
- high dispatching priority for a processor-bound job (in compatibility mode) or high importance for the service class of a processor-bound job (in goal mode)
- small block size I/O
- excessive use of expensive supervisor service
A high processor delay value might be caused by one or more of the following:
- ineffective choice of dispatching priorities in either the SRM IPS (compatibility mode) or ineffective choice of importances in the active service policy (goal mode)
- high priority work using an excessive amount of CPU
- ineffective mean-time-to-wait usage
# delayed jobs for storage
The average number of jobs for each group that are waiting for a COMM, LOCL (both include shared pages), SWAP, or VIO page, was on the out/ready queue, or was a result of a cross-memory address space or standard hiperspace paging delay.
For enclaves, only COMM, cross-memory, and shared page delays apply.
A high storage delay value can be associated with common storage paging (COMM), local storage paging (LOCL), swap-in delay (SWAP), swapped out and ready delay (OUTR), and other delays (OTHR) which includes virtual I/O paging and paging delays from cross-memory address spaces and standard hiperspaces.
A high storage delay associated with common storage paging might be caused by one or more of the following:
- insufficient page data sets
- not enough central storage
- poorly tuned paging configuration
- too many address spaces in storage
- too many "logical swap" address spaces in storage
- excessive storage isolation of address spaces
- too many extremely large address spaces resident
- paging data set on shared device
- high use of user I/O on paging volume
- "common I/O" contends with "swap I/O"
- common data set on wrong device
A high storage delay associated with local storage paging might be caused by one or more of the following:
- insufficient page data sets
- not enough central storage
- address space is under isolated (causing trim) or over isolated (causing others to page/swap)
- poorly tuned paging configuration
- too many address spaces in storage
- too few (artificially low) address spaces in storage
- too many "logical swap" address spaces in storage
- paging data set on shared device
- high use of user I/O on paging volume
- too much swapping
- page-ins are from trimming at swap-out
- "local I/O" contends with "swap I/O"
- program pages in each address space rather than in PLPA
- too many extremely large address spaces resident
A high storage delay associated with virtual I/O might be caused by one or more of the following:
- insufficient page data sets
- poorly tuned paging configuration
- paging data set on shared device
- high use of user I/O on paging volume
- virtual I/O contending with swap I/O
A high storage delay associated with swap-in activity might be caused by one or more of the following:
- too much swapping
- workload too heavy
- insufficient page/swap data sets
- misplaced page/swap data sets
- swap data sets on slow devices
- too few (artificially low) address spaces in storage
- paging data set on shared device
- high use of user I/O on paging volume
- swapped pages moved to backing store on cached device
- not enough central storage
A high delay value for address spaces that are swapped out and ready might be caused by one or more of the following:
- too few (artificially low) address spaces in storage
- workload too heavy
- unbalanced workload
- not enough central storage
- poorly tuned paging configuration
- insufficient page/swap data sets
- too many address spaces in storage
- too many or too few logical swap address spaces
- paging/swapping too slow
- exchange swap rate too high
- too many detected wait swaps
- improper use of storage isolation
Other storage delays might be caused by one or more of the following:
- paging delays from cross-memory address spaces
- paging delays from standard hiperspaces (but not ESO hiperspaces)
Note: On the STOR and STORS reports, the OTHR column includes all other storage delays that are not shown in a separate column under % Delayed For (for example VIO).
execution velocity goal
The target execution velocity for ended transactions that has been in effect for the service class period during the reported range.
performance index
This index helps to compare goals. If, for example, several execution velocity goals with the same importance are not met, this index helps you decide which group was impacted the most.
RMF calculates the performance index depending on the type of goal:
- Execution velocity goal :
perf index = goal% / actual%
- Average response time goal :
perf index = actual(sec) / goal(sec)
- Response time goal with percentile:
perf index = actual(sec) / goal(sec)
In this context "actual" means the maximal response time that actually was reached for the percentage of the goal. To calculate this, perform the following 3 steps:
- Calculate the number of transactions N that correspond to the goal:
N = (sum of all transactions * goal% ) / 100
- Add up all transactions until a bucket M is reached where the sum is greater than N.
- The "actual" response time in the formula for the performance index shown above is the response time value belonging to the bucket M.
Note Due to this methodology, the maximal value of the performance index for this goal type is 4.
important service units (capacity) / transaction
Actual service rate (in unweighted CPU service units per second) as consumed per transaction in a resource group with a high importance (1 or 2).
percentile achieving response time goal
The percentage of transactions that actually ended within the time specified in the goal.
response time
Average response time for all transactions as reported by the CICS TOR or IMS CTL region. However, for subsystem data, it is possible that active time is greater than total time.
Note: All of these response times are for ended transactions only. Thus, if there is a problem where transactions are completely locked out, either while queued or running, the problem will not be seen until the locked-out transactions end.
queue time
Queue time is the difference between total and active time.
For CICS , this may be the queue time for transactions within the TOR, AOR, and other regions, and also processing time within the TOR.
For IMS , this may be the queue time for transactions within the MPR and also processing time within the CTL region.
In all other cases, this is the average time that transactions spent waiting on a JES or APPC queue.
Note: Queue time may not always be meaningful, depending on how you schedule work. For example, jobs are submitted in hold status and left until they are ready to be run, all of the held time counts as queued time. This time may or may not represent a delay to the job.
transaction ended rate
The number of transactions ended per second.
active time
For CICS transactions, active time is the execution time in AOR, only for routed transactions.
For IMS transactions, active time is the execution time within the MPR.
For Batch, TSO, etc., active time is the average time that transactions spent in execution.
service units (capacity) / transaction
Actual service rate (in unweighted CPU service units per second) as consumed per transaction.
response time goal
The goal that has been in effect for the service class period during the reported range:
- The average target response time for all ended transactions
response time goal percentile
The goal that has been in effect for the service class period during the reported range:
- The percentage of transactions that should complete within the time specified in the goal.
service rate
The actual service rate, in unweighted CPU service units per second, as consumed by that resource group.
processor utilization
Average value of processor utilizations within the coupling facility.
In case of a stand-alone coupling facility, the utilization of the individual CPs should be approximately the same. In a PR/SM environment where this CP is shared with other partitions, the utilization is the logical utilization of the CP (that is, only the utilization by the coupling facility).
If the average utilization is high, you can take the following actions:
- .In a PR/SM environment, you can dedicate the CP to the integrated coupling facility or assign additional CPs to the partition.
- .Move structures to a coupling facility with lower utilization.
- .Consider additional or larger coupling facilities.
# effective logical processors
Number of effective available logical processors in a shared environment. This value is only useful in a CFCC environment. CFCC measures the time of real command execution as well as the time waiting for work. The reported value shows the ratio between the LPAR dispatch time (CFCC execute and wait time) and the RMF Mintime length.
For example, if a CFCC CEC contains 6 LPs, and the measured CF LPAR has two logical processors and is limited at 5% the number of effective LPs is 0.3.
total request rate
The sum of synchronous and asynchronous requests completed against any structure within this coupling facility per second. This includes requests that changed from synchronous to asynchronous.
# frames installed
The total amount of storage in the coupling facility, including both allocated and available space.
# frames available
The amount of coupling facility space that is not allocated to any structure and not allocated as dump space.
sync request rate (CF structure)
Number of hardware operations per second that started and completed synchronously to the coupling facility on behalf of connectors to the structure.
async request rate (CF structure)
Number of hardware operations per second that started and completed asynchronously to the coupling facility on behalf of connectors to the structure.
sync service time (CF structure)
Average time in microseconds required to satisfy a synchronous coupling facility request for this structure.
async service time (CF structure)
Average time in microseconds required to satisfy an asynchronous coupling facility request for this structure. This value also includes operations that started synchronously but completed asynchronously.
% subchannel delay
The percentage of all coupling facility requests MVS had to delay because it found all coupling facility subchannels busy.
If this percentage is high, you should first ensure that sufficient subchannels are defined (see MAX field below).
If there are sufficient subchannels and this percentage is still high, it indicates either a coupling facility path constraint or internal coupling facility contention.
% path delay
The percentage of all coupling facility requests that were rejected because all paths to the coupling facility were busy.
A high percentage results in elongated service times which is a reduction of the capacity of the sending processor. If coupling facility channels are being shared among PR/SM partitions, the contention could be coming from a remote partition.
Identifying Path Contention: There can be path contention even when this count is low. In fact, in a non-PR/SM environment where the subchannels are properly configured, Subchannel Busy, not Path Busy, is the indicator for path contention. If Path Busy is low but Subchannel Busy is high, it means MVS is delaying the coupling facility requests and in effect gating the workload before it reaches the physical paths. Before concluding you have a capacity problem, however, be sure to check that the correct number of subchannels is defined in the I/O gen (see Subchannel Max).
PR/SM Environment: If coupling facility channels are being shared among PR/SM partitions, Path Busy behaves differently. Potentially, you have many MVS subchannels mapped to only a few coupling facility command buffers. You could have a case where the subchannels were properly configured (or even under-configured), Subchannel Busy is low, but Path Busy is high. This means the contention is due to activity from a remote partition.
Possible actions: Dedicate the coupling facility links on the sending processor or add additional links.
CF sync request rate (view from MVS image)
Number of hardware operations per second that started and completed synchronously to the coupling facility on behalf of connectors from this system.
CF async request rate (view from MVS image)
Number of hardware operations per second that started and completed asynchronously to the coupling facility on behalf of connectors from this system.
CF sync service time (view from MVS image)
Average time in microseconds required to satisfy a synchronous coupling facility request.
CF async service time (view from MVS image)
Average time in microseconds required to satisfy an asynchronous coupling facility request. This value also includes operations that started synchronously but completed asynchronously.
% using for a dataset
Percentage of time when a job has had an I/O request accepted by the channel for the volume on which the data set resides, but the request is not yet complete.
% delay for a dataset
Percentage of time when a job was waiting to use the data set because of contention for the volume where the data set resides.
i/o rate
Rate of I/O requests. The i/o rate is measured at the hardware level and is the sum of the i/o activity of all systems attached to the volume or ssid.
% cache read hits
Percentage of I/Os that where processed within the cache (cache hits) based on the total number of I/Os.
- % cache READ hits is the percentage for READ operations
- % cache WRITE hits is the percentage for WRITE operations
- % cache DFW hits is the percentage for DASD FAST WRITE operations
- % cache CFW hits is the percentage for WRITE and READ-AFTER-WRITE operations.
% cache read misses
Percentage of I/Os that where NOT processed within the cache based on the total number of I/Os.
Definition: % cache read misses = 100 - % cache read hits
% cache READ misses is the percentage for READ operations
% cache WRITE misses is the percentage for WRITE operations
% of read operations
Percentage of READ requests based on all READ and WRITE requests.
non-cache dasd i/o rate
I/O rate of all requests that accessed DASD. This is the sum of Stage rates (normal or sequential I/O requests that accessed DASD) and other request rates (inhibit cache load, DFW BYPASS, CFW BYPASS, DFW INHIBIT).
CPC capacity (MSU/h)
Processor capacity available to the Central Processor Complex (CPC). The data is in Millions of unweighted CPU service units per hour.
image capacity (MSU/h)
Defined MSU capacity limit for the partition. No data are available, if the partition is not under control of the License Manager. The data is in Millions of unweighted CPU service units per hour.
% weigth of max
Average weighting factor in relation to the maximum defined weighting factor for this partition.
% WLM capping
Percentage of time when WLM capped the partition because the four-hours average MSU value exceeds the defined capacity limit.
four hour MSU average
The average CPU consumption of the partition over the last four hours measured in millions of unweighted CPU service units per hour (MSU/h).
four hour MSU maximum
The maximum CPU consumption of the partition over the last four hours measured in millions of unweighted CPU service units per hour (MSU/h). This value can be greater than the defined capacity.
actual MSU
Actual MSU consumption of the image running in the specified partition. Data is in millions of unweighted CPU service units per hour.
average number of logical processors
The average number of logical processors assigned to this partition.
% effective logical dispatch time
Average effective dispatch time as percentage of the total online time.
% total logical dispatch time
Average total dispatch time as percentage of the total online time.
% effective physical utilization (CPC)
The effective utilization of the physical processors by all partitions running in the CPC.
This data is based on the total interval time of all physical processors and does not include LPAR management time.
RMFPM gives you the ability to select between the sum of all CP partitions and the sum of all ICF integrated Coupling Facility) or IFL (Integrated Facility for Linux) partitions.
% effective physical utilization (partition)
The effective utilization of the physical processors by the partition.
This data is based on the total interval time of all physical processors and does not include LPAR management time.
RMFPM gives you the ability to differentiate between CP partitions and ICF(integrated Coupling Facility) or IFL (Integrated Facility for Linux) partitions.
% total physical utilization (CPC)
The total utilization of the physical processors by all partitions running in the CPC.
This data is based on the total interval time of all physical processors and includes LPAR management time.
RMFPM gives you the ability to select between the sum of all CP partitions and the sum of all ICF integrated Coupling Facility) or IFL (Integrated Facility for Linux) partitions.
% total physical utilization (partition)
The total utilization of the physical processors by the partition.
This data is based on the total interval time of all physical processors and includes LPAR management time.
RMFPM gives you the ability to differentiate between CP partitions and ICF(integrated Coupling Facility) or IFL (Integrated Facility for Linux) partitions.
% total LPAR management
The average LPAR management time percentage.
remaining time until capping in seconds (by partition)
The projected time until WLM soft capping will start. WLM soft capping takes place to prevent you from using more than the defined capacity over a long period of time. This is under the assumption you continue to use your system as you have done in the immediate past. The maximum number RMF reports is 14400 seconds or 4 hours. If RMF reports 14K, it means the remaining time until capping is at least 14K seconds.
 |
5694-A01 (C) Copyright IBM Corporation 1998, 2001
- Java and all Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries.
- z/OS, zSeries, OS/2, OS/390, and RMF are trademarks of the IBM Corporation
- Windows NT, Windows 95, Windows 98, Windows ME, Windows 2000 and Internet Explorer are trademarks of the Microsoft Corporation
- UNIX is a registered trademark licensed exclusively through The Open Group.
- Linux is a registered trademark of Linus Torvalds.
|