|
z/OS V1R8 LSPR FAQ
Question 1:
What are the major changes to the z/OS V1R8 LSPR for the IBM System z10 EC?
Answer:
The LSPR ratios reflect the range of performance between prior zSeries servers and the z10 EC as measured using a wide variety of application benchmarks. The latest release of LSPR contains a number of updates to reflect the continuing evolution of zSeries customer applications and configurations. First, the workload suite has changed: the Java based batch workload (CB-J) is replaced with a more customer-like Java based batch workload (ODE-B). Second, all workloads were moved to z/OS 1.8 and more recent levels of subsystem and compiler software. Third, the new HiperDispatch feature (described later below) is turned on for all z10 EC z/OS LSPR measurements. The LSPR continues to include both single-image z/OS and multi-image z/OS in separate tables.
Question 2:
Why are there two tables in LSPR?
Answer:
The LSPR was enhanced to include performance ratios reflecting both "single-image" z/OS and "multi-image" z/OS environments when z9 was introduced. Typically, zSeries processors are configured with multiple images of z/OS. Thus, the LSPR continues to include a table of performance ratios based on average multi-image z/OS configurations for each processor model as determined from the profiling data. Since the multi-image z/OS table is much more representative of the vast majority of customer configurations, it is used as the basis for setting MIPS and MSUs for the z10 EC.
Question 3:
What multi-image configurations are used to produce the LSPR multi-image table?
Answer:
A wide variety of multi-image configurations exist. The main variables in a configuration typically are: 1) number of images, 2) size of each image (number of logical engines), 3) relative weight of each image, 4) overall ratio of logical engines to physical engines, 5) the number of books and 6) the number of ICFs/IFLs. The configurations used for the LSPR multi-image table are based on the average values for these variables as observed across a processor family. It was found that the average number of images ranged from 5 at low-end models to 8 at the high end. Most systems were configured with 2 major images (those defined with >10% relative weight). On low- to midrange models, at least one of the major images tended to be configured with a number of logical engines close to the number of physical engines. On high-end boxes, the major images were generally configured with a number of logical engines well below the count of physical engines reflecting the more common use of these processors for consolidation. The overall ratio of logical to physical engines (often referred to as "the level of over-commitment" in a virtualized environment) averaged as high as 5:1 on the smallest models, hovered around 2:1 across the majority of models, and dropped to 1.3:1 on the largest models. The majority of models were configured with one book more than necessary to hold the enabled processing engines, and an average of 2 ICFs/IFLs were installed.
Question 4:
Which LSPR table should I use for capacity sizing?
Answer:
For high-level sizing, most users will find the multi-image table to reflect configurations closest to their own. This is simply due to the fact that most systems are run with multiple z/OS images. However, the most accurate sizings require zPCR's LPAR Configuration Capacity Planning function, which can be customized to exactly match a specific multi-image configuration rather than the average configurations reflected in the multi-image LSPR table.
Question 5:
If I compare the two tables, why are the capacity ratios for some models higher in the single-image table while other models have higher ratios in the multi-image table?
Answer:
Just as capacity ratios are sensitive to workload characteristics (note the varying capacity ratios within a table associated with different workloads), capacity ratios will also be sensitive to the configuration of z/OS images on a processor. If one compares a processor configured only with a single, large z/OS image to the same processor configured with multiple z/OS images, there are both pluses and minuses that come into play. There is a cost incurred to manage multiple z/OS images and their associated logical processors. There is also a cost incurred as the size of a z/OS image increases. Thus, if one compares a configuration of a single large z/OS image to a configuration of multiple but smaller z/OS images, the net result can vary as the magnitude of the pluses and minuses will vary. The sensitivity of the multi-image configurations to the number of images, size of each image, relative weights and overall logical: physical ratio will cause a fair amount of variability in the capacity ratios of these configurations. The multi-image table provides a representative view of these ratios based on average configurations. However, "your mileage will vary" as configurations deviate from average. zPCR's LPAR Configuration Capacity Planning function can provide capacity ratios customized to specific configurations.
Question 6:
What model is used as the "base" or "reference" processor in the z/OS V1R8 LSPR tables?
Answer:
The 2094-701 processor running a single copy of z/OS is used as the base in the z/OS V1R8 tables. Thus, the ITRR for the 2094-701 appears as 1.00 in the single-image table. The 2094-701 in the multi-image table was configured based on the average client configuration, thus incurruing a cost to run this complex LPAR configuration. Therefore, in the multi-image table, the 2094-701 appears with an ITRR of 0.94 (note that the actual ITRR is 0.944 but the LSPR tables show only two decimal digits).
Question 7:
What "capacity scaling factors" are commonly used for the z10 EC?
Answer:
The LSPR provides capacity ratios among various processor families. It has become common practise to assign a capacity scaling value to processors as a high-level, gross approximation of their capacities. The commonly used capacity scaling values associated with the z10 EC may be approximated by multiplying the ITRRs in the LSPR z/OS V1R8 multi-image table by 602. For example, the 2097-701 has an ITRR of 1.53 in the multi-image table, thus, the "capacity scaling value" of the 2097-701 would be approximated by 1.53x602 = 921. The multi-image table is used as the basis for this calculation as it is considered to best represent the majority of production systems.
Question 8:
How much variability in performance should I expect when moving a workload to a z10 EC?
Answer:
As with the introduction of any new server, workloads with differing characteristics will see some variation in performance when moved to the z10 EC. The performance ratings for a server are determined by averaging the performance of a variety of workloads that represent what we understand to be the major components of our customers' production environments. While the ratings provide good "middle-of-the-road" values, they do represent an average, and by definition some workloads fall higher than the average and some workloads fall below. The z10 EC has been specifically designed to focus on new and emerging workloads where the speed of the processor is a dominant factor in performance. The result is a quantum jump in clock speed - the z10 EC runs at 4.4 Ghz compared to the z9 EC which ran at 1.7 Ghz. The storage hierarchy design of the z10 EC is also improved over z9 EC, however, the improvement is somewhat limited by the laws of physics so the latencies have increased relative to the clock speed. Thus, workloads that are CPU-intensive will tend to run above average while workloads that are storage-intensive will tend to run below average, and the spread around the average will likely be larger than seen in recent processors. Additionally, newer applications, such as those with compiler optimizations for the z10 EC may see even higher benefits, particularly those that may be enhanced over time to exploit some of the new instructions provided with the z10 EC. The LSPR measurements can provide an indication of the potential variability when moving z/OS workloads to a z10 EC. For example, using the single-image z/OS measurements on a 2097-716 versus a 2094-716, we saw performance ratios of: a) 1.51x for the average workload mix, b) 1.62x for the highest workload ODE-B (CPU-intensive), and c) 1.42x for the lowest workload OLTP-W (storage-intensive). The variation of individual jobs or transactions can be even larger, for example, the average job in our CB-L workload improved 1.58x but the range in individual job improvement was from 1.2x to 2.1x.
Question 9:
Once my workload is up and running on a z10 EC, how much variability in performance will I see?
Answer:
Minute-to-minute, hour-to-hour and day-to-day performance variability generally grows with the size (capacity) of the server and the complexity of the LPAR configuration. With its improved processor speed and the capability to be configured with up to 64 engines, the z10 EC has the capability to deliver nearly 1.7 times the capacity of the largest previous server., Significant enhancements to the z/OS dispatcher and the PR/SM management algorithms (see HiperDispatch discussion below) have been made to help reduce the potential for increased performance variability. In the spirit of autonomic computing, PR/SM and the z/OS dispatcher cooperate to automatically place and dispatch logical partitions to help optimize the performance of the hardware, and minimize the interference of one partition to another. However, while the average performance of workloads is expected to remain reasonably consistent when viewed at small increments of time or by individual jobs or transactions, performance could potentially see more variation than in the past simply due to the expected larger and more complex LPAR configurations that can be supported by the z10 EC.
Question 10:
What is HiperDispatch and how does it impact performance?
Answer:
HiperDispatch is the z/OS exploitation of PR/SM's new Vertical CPU Management (VCM) capabilities and is exclusive to the z10 EC. Rather than dispatch tasks randomly across all logical processors in a partition, z/OS will tie tasks to small queues of logical processors, and dispatch work to a "high priority" subset of the logicals. PR/SM provides processor topology information and updates to z/OS, and ties the high priority logical processors to physical processors. HiperDispatch can lead to improved efficiencies in both the hardware and software in the following two manners: 1) work may be dispatched across fewer logical processors therefore reducing the "multi-processor (MP) effects" and lowering the interference among multiple partitions; 2) specific z/OS tasks may be dispatched to a small subset of logical processors which PR/SM will tie to the same physical processors thus improving the hardware cache re-use and locality of reference characteristics such as reducing the rate of cross-book communication.
Question 11:
What kind of performance improvement can I expect to see from HiperDispatch?
Answer:
The magnitude of the potential improvement from HiperDispatch is related to: a) the number of physical processors, b) the size of the z/OS images in the configuration, c) the logical:physical overcommit ratio and, d) the memory reference pattern or storage hierarchy characteristics of the workload Generally, a configuration where the largest z/OS image fits within a book will see minimal improvement. Workloads that are fairly CPU-intensive (like batch applications) will see only small improvements even for configurations with larger z/OS images since they typically have long-running tasks that tend to stick on a logical engine anyway. Workloads that tend to have common tasks and high dispatch rates as often seen in transactional applications may see larger improvements, again depending on the size of the z/OS images involved. LPAR configurations that are over committed, i.e. have higher logical to physical ratios, may see some improvement although the benefit of dispatching to a reduced number of logicals overlaps with benefits already available with IRD and various automation techniques that tend to reduce the number of online logical processors to match capacity needs. The range in benefit is expected to be from 0% to 10% following the sensitivities described above; specifically, configurations with z/OS images small enough to fit in a book or running batch-like workloads will tend to fall at the low-end of the range, multi-book configurations with z/OS images in the 16way to 32way range and running transactional workloads will tend to fall toward the middle of the range, and very large multi-book configurations with very large z/OS images and running workloads with intense memory reference patterns will tend to fall toward the high end of the range.
|