(Originally posted 2011-10-01.)
I was going to start this post with an apology. But, as any sensible blogger would, I left it a few days to write this. Now I realise that there’s a wider point than the "I was wrong" one. (But I was wrong – in a way that I think many other people might’ve been wrong too.)
So let me talk about two things in this post:
- zAAP and zIIP Delay.
- How I came to be wrong and what we can all learn from it.
zAAP and zIIP Delay
The fields we’re talking about here appear on the RMF Workload Activity Report and are – in the Type 72-3 record – R723IFAD (for zAAP) and R723SUPD (for zIIP). Personally I use the latter as shown in zAAP and zIIP Revisited (At Last).
They are described as "zAAP delay samples" and "zAAP using samples".
(Hereafter I’m going to drop the "or zIIP" bit. And, by the way, zAAP-on-zIIP doesn’t affect the discussion significantly.)
But what does R723IFAD mean? I had assumed (perhaps mentally fuzzily) that it meant "delay samples because zAAP was unavailable" (likewise zIIP). So my recommendation would have been that the way around it was to provision more specialty engine capacity to the service class period.
It turns out that’s not the right interpretation of the field. Here’s a better one:
For a delay sample to be declared for a service class period for zAAP all of the following criteria have to be met:
- We were trying to run zAAP-eligible code. (I think we knew this.)
- No zAAP could run the work. (My basic assumption.)
- No general-purpose engine (GCP, in my parlance) ran the work. (This is the new bit.)
So, seeing significant samples in the "Delay for zAAP" field means we not only didn’t get to run on a zAAP but we also didn’t run on a GCP. And the implication here is that understanding all this requires us to weave in the GCP view. We could be short on both zAAP and GCP capacity.
Now, I would guess the "take home" is still provision more zAAP capacity to the service class period – if you want to increase its velocity and Delay for zAAP is the major issue. (There are some rare cases where that mightn’t be right – given processors come in integer numbers.) But the reasoning is a little different: You’d rather run zAAP-eligible work on a zAAP than on a GCP, I would think.
For completeness, the "CPU Delay" field (R723CCDE) is for non-zAAP-eligible work and "CPU Capping Delay" (R723CCCA) is also for non-zAAP-eligible work. (Helpfully the SMF manual states that R723CCCA is not a subset of R723CCDE.) If R723CCDE / R723CCDE come into play, then, it’s about provisioning GCP capacity – or, distantly, finding a way to make more of the work eligible for zAAP.
The Wider Lesson
We’re all used to reading numbers off reports and taking them at "face value". If some metric is called "splodgeness" and the value is high we say "splodgeness abounds" without necessarily giving it too much thought. But what is this "splodgeness" whereof we speak?
Often all we get is the description "zAAP delay samples." (If you think we get more then do please look at the SMF manual’s description for R723IFAD.) So we tend to:
- Cling to the certainty the existence of a particular metric gives us. I think we’re grateful to have the metric. After all, consider the counterproposition.
- Invent for ourselves an interpretation of what the metric means. I say, perhaps rudely, "invent" because who’s to say if we have the right interpretation? We have to gain a foothold somehow. So actually I’m entirely sympathetic.
So, in response to a customer question I set off in search of an answer to the question "what does R723IFAD mean?" I have a friend in RMF who mentioned they got the number from WLM and suggested I ask a mutual friend in WLM. He, very usefully, pointed me at Dan Rosa in Systems Software Development in Poughkeepsie. Dan and I chatted for well over an hour and he helped form a very good understanding of what this field means. So many thanks to Dan!
Now, I count myself as very lucky in having friends in such useful places. I realise that’s a privilege. And I don’t tend to bombard them with questions about each and every SMF record’s field.
So, I think there are lots of fields like that. I’ve stumbled across a fair few. It would be nice to revive the old RMF Field Description manual (last updated in the early 1990’s). I don’t think that’s going to happen, unfortunately. And it would take forever to bring it up to date.
But I do think it’s legitimate to gain an understanding of where a field came from, why it was invented, and how it behaves. And that’s what I try to do – for fields I think tell a useful story. And that’s part of why I actually like questions about fields, and part of why I feel like a "kid at Christmas" whenever new data arrives: It gives me a chance to see how this stuff behaves and how y’all’re using our hardware and software.
So, in conclusion, we learn and grow together. But there’s always room for better understanding. I guess I knew all this. Tacitly, I expect a lot of you will share my experience.