(Originally posted 2013-12-13.)
Increasingly people are going to want to understand their zIIP usage and do capacity planning for zIIPs. Previously I’ve written about zIIP CPU numbers from the RMF perspective, namely at the WLM Workload and Service Class levels. This post is about taking it down a layer – to the address space level – using SMF Type 30 records.
(I’ve always thought it a pity there isn’t a standard reporting program for SMF 30, analogous to the RMF Postprocessor. But I digress.)
As you’ve probably gathered SMF 30 is one of my favourite record types. This post describes a few more ways you can get value from it. And some of those ways are, as you’d expect from me, more about nosiness about what customers are doing than about performance or capacity.
What We Have
First, a brief review of the zIIP-related numbers. (And just about everything I say in this post relates to zAAP as well.)
For a very long time we had TCB and (Non-Preemptible) SRB time.
A long time ago – when they were introduced – we had Preemptible SRB times – for Dependent and Independent Enclaves.
When zIIPs were introduced we had zIIP-eligible and zIIP-eligible-but-on-a-GCP sets of times.
The latter is the case where work was eligible to run on a zIIP but instead ran on a General Purpose processor (or GCP for short).
Both sets of zIIP times incorporate two buckets of time: for Dependent Enclaves and Independent Enclaves. I subtract these two numbers from the headline zIIP time. For example,
Other_zIIP = Overall_zIIP - Dependent_Enclave_zIIP - Independent_Enclave_zIIP
This is a useful thing to do – as we shall see.
zAAP On zIIP
zAAP On zIIP allows you to treat zAAP-eligible work as if it were zIIP-eligible. The benefit of this is that it gives you additional ways to fill up a zIIP.
From an instrumentation point of view, with zAAP On zIIP all the zAAP-related numbers become 0.
Address Space Characterisation
(My standard claim applies here: It helps a lot if you can get information from widely-available instrumentation without either asking someone or going to more specific data. In this instance it’s provided by SMF 30.)
Suppose you have an address space in mind – and it could be any Full Function address space:
- It could be a batch job, cutting SMF 30 Subtypes 4 and 5 when steps and the job end.
- It could be a long-running address space, cutting SMF 30 Subtypes 2 and 3 on an interval basis.
- It could be a terminating address space, in which all four subtypes should get cut at the appropriate points.
The point is it doesn’t matter which of the above applies. The fields I just described are always present.
Let’s pick on one example: The job name is immaterial but the program name is CTGBATCH (and zAAP on zIIP is in play). You might know this program name denotes the address space is running CICS Transaction Gateway (CTG). You might not know that much of CTG’s work is executing Java. (Non-JNI) Java work is zAAP-eligible but in a zIIP-but-not-zAAP environment it becomes zIIP-eligible. But it’s not work that runs in either a Dependent Enclave or an Independent Enclave: Its CPU falls into the “Other zIIP” category I just calculated. This would also be true of System XML processing (which is not Java).
Another example is DDF. The DIST address space is obvious to spot: Its job name ends in “DIST”, just as the corresponding DBM1 address space’s job name ends in “DBM1” (and the subsystem name is whatever precedes these two in the job name) . When some DDF work enters the system it is assigned to an Independent Enclave – but not until work such as authorisation has already taken place under TCBs. You can see the TCB time, the Independent Enclave time and the zIIP-eligible Independent Enclave time in SMF 30. The “Authorisation etc” time is the TCB time minus the Independent Enclave time.
Commonly we talk of the eligible percentage for DDF. That is the zIIP-eligible Independent Enclave time divided by the Independent Enclave time, converted to a percentage.
If you want to go deeper on this you do, of course, need to work with DB Accounting Trace (SMF 101).
An Important Case – DB2 Version 10
In DB2 Version 10 some performance-critical categories of work – Deferred Write Engines and Prefetch Engines – became zIIP-eligible. Note the words “ performance-critical categories of work”. This is the first time that phrase could be used with reference to zIIP-eligibility. And it’s right: If these engines don’t run in a timely fashion bad things happen.
The implication of this is we can’t fill zIIPs to the brim with this kind of work, and especially not if the LPAR has only one or two zIIPs.
If we do then either the work will get delayed because it can’t cross over to the General-Purpose Processors (GCPs) – and that will be a problem – or else it does cross over and we might get an unacceptable loss of zIIP benefit.
DB2 Lab recommends that, averaged over the peak 15 minutes (happily usually an RMF interval), you don’t run DBM1 zIIP utilisation for this work above 30 to 50% busy. The 30% number is for a single zIIP and the 50% number is for numerous zIIPs.
But what about “multitenant” zIIP usage? For example DBM1 plus Java work?
You can infill with less performance-critical work such as Java so long as you:
- Classify DBM1 properly so WLM and SRM can protect it.
- Classify this infill work appropriately.
- Don’t load the zIIPs as heavily as you would GCPs.
So how do you identify this performance-critical DBM1 work? It’s actually not difficult as the work shows up in the Dependent Enclave zIIP Eligible CPU time. (And the amount that crosses over to GCPs is in an analogous time bucket – so yiu can check if this amount is acceptable or not.)
I’ve just mentioned how to tell if there’s too much crossover to GCPs. But what about the other problem area – work getting delayed? There are two places to look:
- In Accounting Trace for work being delayed. I’d expect it to show somewhere like the Read Asynchronous Wait and Write Asynchronous Wait buckets.
- In Statistics Trace with failures to get Prefetch and Deferred Write engines.
Note: Prior to PM30468 DB2 Version 10 scheduled these engines in a way that caused the CPU to show up under MSTR rather than DBM1. With the fix it’s in DBM1. (I’ve not seen it in MSTR.)
I hope you’ve seen how the various zIIP-related fields in SMF 30 can be used to understand the proclivities of an address space. More importantly, I hope you’re more aware than ever of the importance of zIIP capacity planning. And especially the added emphasis the new zIIP exploitation by DB2 Version 10, now it’s become a widespread version.
(I’ve had this post “in the can” for a couple of weeks and been sensitised in the meantime to some new things. All of which will appear in the “zIIP Capacity Planning” presentation I also have in the works.)