For quite a while now I’ve been able to do useful CPU analysis down at the individual logical processor level. In fact this post follows on from Engineering – Part Five – z14 IOPs – at a discreet distance.
I can’t believe I haven’t written about Defined Capacity Capping before – but apparently I haven’t.
As you probably know such capping generally works by introducing a “phantom weight”. This holds the capped LPAR down – by restricting it to below its normal share (of the GCP pool). Speaking of GCPs, this is a purely GCP mechanism and so I’ll keep it simple(r) by only discussing GCPs.
But have you ever wondered how RMF (or PR/SM for that matter) accounts for this phantom weight?
Well, I have and I recently got some insight by looking at engine-level GCP data. Processing at the interval and engine level yields some interesting insights.
But let me first review the data I’m using. There are three SMF record types I have to hand:
- 70-1 (RMF CPU Activity)
- 99-14 (Processor Topology)
- 113 (HIS Counters)
I am working with a customer with 8 Production mainframes (a mixture of z14 and z15 multi-drawer models). Most of them have at least one z/OS LPAR that hits a Defined Capacity cap – generally early mornings across the week’s data they’ve sent.
None of these machines is terribly busy. And none of them are even close to having all physical cores characterised.
Vertical Weights
In most cases the LPARs only have Vertical High (VH) logical GCPs. I can calculate what the weight is for a VH as it’s a whole physical GCP’s worth of weight: Divide the total pool weight by the total number of physical processors in the pool. For example, if the LPARs’ weights for the pool add up to 1000 and there are 5 physical GCPs in the pool a physical GCP’s worth of weight is 200 – and so that’s the polar weight of a VH logical GCP. (And is directly observable as such.)
Now here’s how the logical processors are behaving:
- When not capped all the logical processors have a full processor’s weight (as expected).
- When capped weights move somewhat from higher-numbered logical GCPs to lower-numbered ones.
The consequence is some of the higher numbered ones become Vertical Lows (VLs) and occasionally a VH turns into a Vertical Medium (VM). What I’ve also observed is the remaining VH’s get polar weights above a full engine’s weight – which they obviously can’t entirely use.
And we know all this from SMF 70 Subtype 1 records, summarised in each RMF interval at the logical processor level.
Logical Core Home Addresses
But what are the implications of Defined Capacity capping?
Obviously the LPAR’s access to GCP CPU is restricted – which is the intent. And, almost as obviously, some workloads are likely to be hit. You probably don’t need a lecture from me on the especial importance of having WLM set up right so the important work is protected under such circumstances. Actually, this post isn’t about that.
There are other consequences of being capped in this way. And this is really what this post is about.
When a logical processor changes polarisation PR/SM often reworks what are deemed “Home Addresses” for the logical processors:
- For VH logical processors the logical processor is always dispatched on the same physical processor – which is its home address.
- A VM logical processor isn’t entitled to a whole physical processor’s worth of weight. It has, potentially, to share with other logical processors. But it still has a home address. It’s just that there’s a looser correspondence between home address and where the VM is dispatched in the machine.
- A VL logical processor has an even looser correspondence between its home address and where it is dispatched. (Indeed it has no entitlement to be dispatched at all.)
What I’ve observed – using SMF 99 Subtype 14 records – follows. But first I would encourage you to collect 99-14 as they are inexpensive. Also SMF 113, but we’ll come to that.
When SMF 70-1 says the LPAR is capped (and the weights shift, as previously described) the following happens: Some higher-numbered logical GCPs move home addresses – according to SMF 99-14. But, in my case, these are VL’s. So their home addresses are less meaningful.
In one case, and I don’t have an explanation for this, hitting the cap caused the whole LPAR to move drawers. And it moved back again when the cap was removed.
If the concept of a home address is less meaningful for a VL, why do we care that it’s moved? Actually, we don’t. We care about something else…
… From SMF 113 it’s observed that Cycles Per Instruction (CPI) deteriorates. Usually one measures this across all logical processors, or all logical processors in a pool. In the cases I’m describing these measures deteriorate. But there is some fine structure to this. In fact it’s not that fine…
… The logical processors that turned from VH to VL experience CPI values that move from the reasonable 3 or so to several hundred. This suggests to me these VL logical processors are being dispatched remote from where the data is. You could read that as “remote from where the rest of the LPAR is”. There might also be a second effect of being dispatched to cores with effectively empty local caches (Levels 1 and 2). Note: Cache contents won’t move with the logical processor as it gets redispatched somewhere else.
So the CPI deterioration factor is real and can be significant when the LPAR is capped.
Conclusion
There are two main conclusions:
- Defined Capacity actual capping can have consequences – in terms of Cycles Per Instruction (CPI).
- There is value in using SMF 70-1, 99-14, and 113 to understand what happens when an LPAR is Defined Capacity capped. And especially analysing the data at the individual logical processor level.
By the way, I haven’t mentioned Group Capping. I would expect it to be similar – as the mechanism is.