(Originally posted 2015-11-15.)
I don’t think I’ve written about the concept of Capture Ratio before. To be honest it’s kind of a “nerdy” or “internal” thing. But a recent experience suggests to me it is interesting, even if only for the wrong reason.
What Is Capture Ratio?
Not all CPU in a z/OS system can be attributed to a service class: If you add up all the CPU in SMF 72–3 (Workload Activity) it always amounts to less than the CPU in SMF 70–1 (System-Level).
If we divide Workload CPU by System CPU and turn it into a % we get a Capture Ratio. 
So what do we expect? Our observations are
- Generally most systems show capture ratios in the range 85% to 95%.
- Capture ratios vary, but not usually by very much. 
- Capture ratios are lower for very low utilisation systems than very high utilisation ones.
- Capture ratios are lower for highly paging systems, and probably for high I/O ones.
Generally I don’t see anything better about a system with a capture ratio in the low 90’s than one in the high 80’s, percentagewise. So I wouldn’t fret about that.
How Do We Use Capture Ratio?
As I indicated at the outset, this has been an internal thing.
In a recent study, to tweak nobody’s nose at all, we saw appallingly low and more or less random capture ratios. It turned out we were missing lots of 72–3 records. So the capture ratio was a good diagnostic tool.
Despite what I said about capture ratio being “internal” we have a standard chart that plots capture ratio for a system by day. This is why I know about the behaviours listed above.
What Went Wrong?
In some studies over the past few years our capture ratio has gone over 100%. It really shouldn’t.
While this has been “subliminally troubling” it hasn’t been enough to make me spring into action. With a recent study, however, we were getting capture ratios of hundreds of percent. Enough to set alarm bells ringing. So Dave Betten and I set to debugging.
It’s all down to zIIP: We only get capture ratios above 100% when both the following are true:
- We have substantial zIIP CPU relative to GCP CPU.
- The zIIP Normalisation Factor is substantially higher than 1.
Our code has a combined capture ratio, plus separate ones for GCP and zIIP CPU. We plot the former but have ignored the latter two.
I saw the pattern: Excessive zIIP capture ratio. Dave debugged the logic, which confirmed it. We’re using the zIIP Normalisation Factor wrong in both the general and zIIP capture ratio calculations.
Adjusting the zIIP capture ratio in a spreadsheet one system’s pair of capture ratios look like this:
I’ve summarised across 8-hour shifts and the x axis is a shift number.
The numbers appear to have “come right” and examining our logic suggests they should be right.
I think I discern that most of the time zIIP capture ratio is slightly above GCP capture ratio. This is what I’d guess, based on zIIPs not doing I/O. But I’m not 100% sure. Future data sets will tell.
Interestingly, the “wrong calculation” zIIP capture ratio was proportionately worse for a system on a machine where the zIIP Normalisation Factor is 6.22 than the ones where it is 2.36. But that’s not surprising.
Putting It Right
One key lesson is: Don’t boost everything by capture ratio to fill in gaps.
- The “low and random” case shows that’s not good idea as you introduce distortion that way.
- The “impossibly high” case shows something fundamental is wrong.
So we know what the “excessively high” case is caused by. Now to get a fix tested and into Production.
And you might expect to see (at least pedagogically) a new chart that separates zIIP Capture Ratio from GCP Capture Ratio. I think this “fine structure” will be useful to glean.
So I hope I’ve shown that Capture Ratio is interesting, even without the bug we’ve troubleshot.
And “every day for us something new, open mind for a different view, and nothing else matters”  applies to this. Comme d’habitude. 🙂
I’ve seen people write “capture ration” and it’s not been people for whom English isn’t their first language, but it could be autocorrect. 🙂 ↩
Of course this is technically wrong as it’s a percent, not a ratio. Nevermind. 🙂 ↩
This stability is fairly reassuring. It seems like a real thing. ↩
This has now been resolved and we have complete set of 72–3 data. ↩
You have to divide what’s in SMF 70–1 and in SMF 72–3 by 256 – which implies a granularity all of its own. ↩