(Originally posted 2016-06-11.)
Pardon the bad pun. Perhaps I should’ve written “Engine-ering” but where exactly do you put the dash?
There were hints on this topic in Born With A Measuring Spoon In Its Mouth but the real motivation came from a z196 customer without Hiperdispatch enabled.
But what on earth am I on about?
OK, here we go:
Generally our code doesn’t report down to the single engine (or processor) level.
Actually the same customer who isn’t using Hiperdispatch is using IRD , a predecessor and third case where engine-level reporting could be handy.
Why We Don’t Generally Go Down To Engine Level
We generally stop at pool (or processor type), for example the zIIP pool.
Traditionally there hasn’t been much you can actually affect at the engine level.
So the sorts of questions we ask are:
- How busy is the IFL Pool?
- How much CPU in the GCP Pool is this LPAR using?
- Which application componentry is using the zIIP capacity?
- How busy is a Coupling Facility?
None of these are helped much by going to the level of an individual engine.
Why We Might Be Interested In Engines
LPAR design has always been interesting (and a little tricky).
So, to take one example, Hiperdispatch Parking behaviour is an engine-level phenomenon most customers need to understand and monitor.
Theoretically, if we were interested in certain kinds of contention, seeing a skew in favour of, say, one engine might be interesting.
Where Are We Starting From?
Let me lift the lid on where our code is (just a little):
- In table (record mapping terms) we go down to the engine level for all RMF record types. We roll up from there.
- in reporting we handle IRD and do some Hiperdispatch work. See below.
We graph shifting weights within an LPAR Cluster. Our view of what the weights say the number of shared engines for an LPAR should be is dynamic.
We graph the number of online engines for an LPAR. When IRD was in its heyday this could be quite interesting.
We look at two things:
- Vertical Polarisation
A couple of posts of potential interest are:
Engine-Level Data Model
The engine-level data model is pretty extensive. There are two cases to consider:
- Coupling Facility View Of CPU (SMF 74–4)
- General View
Coupling Facility Engines
In the post I mentioned R744PBSY and R744PWAI – “Busy” and “Wait” times. What I briefly mentioned is that these are recorded at the logical processor level.
My current take is there’s only limited excitement to be had by reporting at the engine level – given L-shaped ICF LPARs are a thing of the past.
So, right now, our log table does indeed have Processor Number (R744PNUM) as a key. Our summary table (the one we actually report from) doesn’t. I don’t intend to change that.
SMF 70–1 gives engine-level information in quite a few areas:
- I previously mentioned Online Time (SMF70ONT) in the context of IRD.
- For Hiperdispatch we have Polarisation flags – For High, Medium and Low engine cases. We also have Parked Time (SMF70PAT) but only for the reporting LPAR’s processors (which I first wrote about in 2008 in System z10 CPU Instrumentation).
- At the LPAR level we have the Logical Processor Data Section
- For SMT we have all I mentioned in Born With A Measuring Spoon In Its Mouth.
- We have CPU busy by engine for the reporting LPAR in the CPU Data Section.
The Shape Of Things To Come?
So what am I thinking of?
Well, the underlying principle is that it’s the non-uniformity between (logical) engines for an LPAR that is interesting.
And maybe – in another dimension – how that non-uniformity varies through time.
So I’ve run a couple of experiments with recent customer data:
- A Non-Hiperdispatch Case.
- A Hiperdispatch Case where the GCP Engine Pool is extremely busy.
I don’t have to hand the case where Hiperdispatch is in play but the GCP Engine Pool is not busy. I have thoughts on what might happen, but this post is already running long.
In this case the LPAR is defined with 8 Online GCPs and 10 Online zIIPs -on a z196.
Here work is “smeared” across all the online engines, certainly the GCPs. None is more than half full, and typically they’re about a third full.
This has effects, such as short engine effect. Also the cache effectiveness won’t be wonderful.
Hiperdispatch Busy GCP Pool Case
In this case the LPAR is defined with 10 Online GCPs (6 Vertical High, 2 Vertical Medium (at 65%) and 2 Vertical Low) and 2 Online zIIPs (both Vertical Medium (at 80%)) – on a z13.
Here, the work is “corralled” into just the Vertical Highs and Vertical Mediums, in accordance with the vertical (engine-level) weights. We are approaching full engines – quasi-dedicated to the LPAR – for the VH cases. There is some evidence of parking and unparking of the Vertical Lows.
So What Might I actually Do?
I can certainly generate graphs like the above at will – and I probably will.
I’m more inclined to do it for the system under study than for all LPARs on a machine / in the data for three reasons:
- It’d be an awful lot of graphs for most of my customers. And the value for obscure LPARs wouldn’t be huge.
- If I have SMF 70 cut by an LPAR (really z/OS system) it will also contain Parked Time (SMF70PAT). Relating Parked Time to Engine Busy Time will be interesting.
- Keeping core vs logical processor straight is important, and driving the above graphs down to logical level is useful. It can only really be done for the systems I have data from.
All the above sounds a little undecided to me – and it is. The reason for sharing all this is because I think Engine Level could well prove useful, as well as being interesting. And, not having seen much writing on this, I suspect this is something most Performance and Capacity people won’t’ve thought about.
I for one intend to keep thinking about this and experimenting. Stay tuned. 🙂
And you never know how some piece of infrastructure will fail to cope with punctuation in a title. ↩
Collective rather than Royal “We” here. 🙂 But there are times when it really usefully could. Two that come to mind are: ↩
Intelligent Resource Director. ↩
Or perhaps better from my point of view. 🙂 At any rate more complex and interesting. ↩
Such as DB2 DBM1 zIIP usage. ↩
But the post got a surprisingly large number of hits. 🙂 ↩