(Originally posted 2016-05-29.)
You probably don’t have the same problem I do, namely not having access to SMF data from all the systems in your mainframe estate.
You’ll recognise that as a provocative statement if ever there was one; For all sorts of reasons not every system’s RMF SMF is collected.
Most notably, test systems often aren’t instrumented.
This post is about Coupling Facility (CF) image CPU. Mostly it’s about CF images on the same footprint as a z/OS system for which you do have data.  The discussion is limited to CPU.
So there are two views of Coupling Facility CPU:
- SMF 70 Partition
- SMF 74–4 Coupling Facility
Both of these are available at the partition and engine level, but the latter is less interesting.
Coupling Facility CPU Utilisation Might Not Be What You Expect
So a standard formula for Utilisation % would be, summed over all engines:
From an SMF 70 perspective that’s certainly true, but it’s not how CF CPU Utilisation is calculated. It’s the following formula , summed over all the engines:
Now, the two formulae look similar, and they would be the same if R744PBSY+R744PWAI added up to the interval length. Well, this is true only for dedicated CFs, namely those not sharing engines with other LPARs.
So for dedicated LPARs that’s fine: CF view of busy (74–4 view) is the same as PR/SM view (70–1 view).
What Are R744PBSY and R744PWAI?
R744PBSY is the CPU time (in the CF) processing requests – from all systems.
R744PWAI is the CPU time (in the CF) polling for requests to process.
With DYNDISP=NO R744PBSY+R744PWAI do indeed add up to the interval x the number of engines as the CFCC never stops polling for requests.
With DYNDISP=YES they don’t add up to the interval x the number of engines. This is because the CFCC stops polling for requests, but not immediately.
So the formula for CF utilisation is really about what percentage of the CF CPU cycles is used processing requests.
What Is R744SETM?
I first wrote about this field in 2008 and you might get a snigger at my expense. Readers in 2008 did. 🙂 Here’s the post: Coupling Facility Structure CPU Time – Initial Investigations
It’s the Structure Execution Time, or CPU time in the Coupling Facility for a CF structure. Key points about it are:
- R744SETM is for all systems accessing the structure.
- R744SETM adds up to R744PBSY
Because of the latter its capture ratio is 100%. This has an effect at low traffic rates; There appears to be some CPU utilisation without any requests. But the CPU per request tends to settle down.
As I said in the above-referenced post the CPU per request  calculation relies on having data from all systems sharing the structure.
Standard Recommendations Still Apply
It’s still wise not to run coupling facilities above 50% (according to the SMF 74 formula). This is for two primary reasons:
- The CF needs to be as responsive as possible, as it affects Coupled CPU. (This includes link times, of course.)
- You might well need “white space” for recovering structures (or, in the case of User-Managed Duplexing, for the Group Buffer Pools to become primary).
What Of Coupling Facility Thin Interrupts?
So now I’m coming to the point.
System zEC12 and CFLEVEL 19 introduced Coupling Facility Thin Interrupts, enabled with DYNDISP=THIN.
Barbara Weiler has a nice paper on this: Coupling Thin Interrupts and Coupling Facility Performance in Shared Processor Environments, so this post is covering only a small (but relevant) portion of what she covers.
In essence Thin Interrupts shortens the time a CF spends polling for work, releasing the physical CPU sooner. This makes it a “better citizen” in terms of sharing the (generally ICF) CPU Pool with other CF LPARs.
The net effect of this is that R744PWAI – the CPU time spent polling for requests – should decrease. From the formula that means the CF CPU Utilisation should increase, despite (or because of) less CF CPU being used overall.
To achieve this PR/SM has to be more active, so at very least the PR/SM CPU for the LPAR (SMF70PDT – SMF70EDT) should increase.
NOTE: Even with Thin Interrupts I’d be wary of using CFs with shared engines in Production. This is because a CF still tends to wait to get an engine back when sharing, elongating requests and making their service times more variable.
So let’s discuss two cases:
- Where you have SMF 74–4 for the CF LPAR.
- Where you don’t have SMF 74–4 for the CF LPAR.
SMF 74–4 View Of Thin Interrupts
First, SMF 74–4 has a new bit field (in R744FFLG) for when Thin Interrupts are enabled.
Second, R744PWAI, as indicated above, should be relatively small and the CF CPU Utilisation relatively high.
So you have “full disclosure” in this case.
SMF 70 View Of Thin Interrupts
I think this is the more prevalent case, as people don’t tend to send me data from test environments (and it’s easier for them to send me “the lot” than to weed out the subsidiary environments).
All you have is SMF 70.
Here, as noted, SMF70PDT – SMF70EDT might well be higher, especially when there is some load.
It’s worth noting that for a non-dedicated CF LPAR the 70 Partition Data view will show the CPU used as variable, and generally far less than the CPU share. When you have a plethora of CF LPARs, or you’re kept away from the real infrastructure, this might be your only clue that Thin Interrupts is enabled.
For dedicated CF LPARs the 70 Partition Data view is of completely utilized engines.
By the way (pro tip here) 🙂 I recently changed our code to put the dedicated engine CF LPARs at the bottom of the stack; It just looks so much better that way. (See A Picture Of Dedication.)
Coupling Facility CPU is a complex topic. As I said on Twitter, I thought this would be a short blog post… 🙂
Well more poured out of my head than I initially thought; I hope some of this is worth pouring into your head. 🙂
So Thin Interrupts has been a good excuse to talk about Coupling Facility CPU Utilisation. It’s also going to be a good reason to revamp some of my code, when I get around to it. 🙂
It’s hopeless trying to understand the performance of CF images for which you have neither SMF 70 Partition Data nor SMF 74–4. ↩
Except when it isn’t (which I think would be rare). ↩
Obviously useful for capacity planning. ↩
… or at least the originally intended point; This post has expanded somewhat, but I’m glad it did. ↩