Captivating Capture Ratios

(Originally posted 2015-11-15.)

I don’t think I’ve written about the concept of Capture Ratio[1] before. To be honest it’s kind of a “nerdy” or “internal” thing. But a recent experience suggests to me it is interesting, even if only for the wrong reason.

What Is Capture Ratio?

Not all CPU in a z/OS system can be attributed to a service class: If you add up all the CPU in SMF 72–3 (Workload Activity) it always amounts to less than the CPU in SMF 70–1 (System-Level).

If we divide Workload CPU by System CPU and turn it into a % we get a Capture Ratio. [2]

So what do we expect? Our observations are

  • Generally most systems show capture ratios in the range 85% to 95%.
  • Capture ratios vary, but not usually by very much. [3]
  • Capture ratios are lower for very low utilisation systems than very high utilisation ones.
  • Capture ratios are lower for highly paging systems, and probably for high I/O ones.

Generally I don’t see anything better about a system with a capture ratio in the low 90’s than one in the high 80’s, percentagewise. So I wouldn’t fret about that.

How Do We Use Capture Ratio?

As I indicated at the outset, this has been an internal thing.

In a recent study, to tweak nobody’s nose at all, we saw appallingly low and more or less random capture ratios. It turned out we were missing lots of 72–3 records.[4] So the capture ratio was a good diagnostic tool.

Despite what I said about capture ratio being “internal” we have a standard chart that plots capture ratio for a system by day. This is why I know about the behaviours listed above.

What Went Wrong?

In some studies over the past few years our capture ratio has gone over 100%. It really shouldn’t.

While this has been “subliminally troubling” it hasn’t been enough to make me spring into action. With a recent study, however, we were getting capture ratios of hundreds of percent. Enough to set alarm bells ringing. So Dave Betten and I set to debugging.

It’s all down to zIIP: We only get capture ratios above 100% when both the following are true:

  • We have substantial zIIP CPU relative to GCP CPU.
  • The zIIP Normalisation Factor is substantially higher than 1.

Our code has a combined capture ratio, plus separate ones for GCP and zIIP CPU. We plot the former but have ignored the latter two.

I saw the pattern: Excessive zIIP capture ratio. Dave debugged the logic, which confirmed it. We’re using the zIIP Normalisation Factor[5] wrong in both the general and zIIP capture ratio calculations.

Adjusting the zIIP capture ratio in a spreadsheet one system’s pair of capture ratios look like this:

I’ve summarised across 8-hour shifts and the x axis is a shift number.

The numbers appear to have “come right” and examining our logic suggests they should be right.

I think I discern that most of the time zIIP capture ratio is slightly above GCP capture ratio. This is what I’d guess, based on zIIPs not doing I/O. But I’m not 100% sure. Future data sets will tell.

Interestingly, the “wrong calculation” zIIP capture ratio was proportionately worse for a system on a machine where the zIIP Normalisation Factor is 6.22 than the ones where it is 2.36. But that’s not surprising.

Putting It Right

One key lesson is: Don’t boost everything by capture ratio to fill in gaps.

  • The “low and random” case shows that’s not good idea as you introduce distortion that way.
  • The “impossibly high” case shows something fundamental is wrong.

So we know what the “excessively high” case is caused by. Now to get a fix tested and into Production.

And you might expect to see (at least pedagogically) a new chart that separates zIIP Capture Ratio from GCP Capture Ratio. I think this “fine structure” will be useful to glean.

So I hope I’ve shown that Capture Ratio is interesting, even without the bug we’ve troubleshot.

And “every day for us something new, open mind for a different view, and nothing else matters” [6] applies to this. Comme d’habitude. 🙂

  1. I’ve seen people write “capture ration” and it’s not been people for whom English isn’t their first language, but it could be autocorrect. 🙂  ↩

  2. Of course this is technically wrong as it’s a percent, not a ratio. Nevermind. 🙂  ↩

  3. This stability is fairly reassuring. It seems like a real thing.  ↩

  4. This has now been resolved and we have complete set of 72–3 data.  ↩

  5. You have to divide what’s in SMF 70–1 and in SMF 72–3 by 256 – which implies a granularity all of its own.  ↩

  6. Nothing Else Matters  ↩

Published by Martin Packer

I'm a mainframe performance guy and have been for the past 35 years. But I play with lots of other technologies as well.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

<span>%d</span> bloggers like this: