This post follows on from A Note On Velocity from 2015. Follows on at a respectful distance, I’d say – since it’s been 5 years.
In that post I wrote “But those ideas are for another day or, more likely, another year (it being December now).” This is that other day / year – as this post reports on some of those “left on the table” aspects. For one, I do now project what happens if we include (or exclude) I/O samples.
In a recent customer engagement I did some work on WLM samples for a Batch service. This service class has 2 periods, the first period having an incredibly short 75 service units duration.
- Period 1 is Importance 4, with a reasonable velocity.
- Period 2 is Discretionary.
Almost everything ends in Period 2 – so almost all batch work in this shop is running Discretionary i.e. bottom dog without a goal.
As I said in A Note On Velocity, RMF reports attained velocity from Using and Delay samples and these come direct from WLM. Importantly this means you can calculate Velocity without having to sum all the buckets of Using and Delay samples. You won’t, for example, add in I/O Using and I/O Delay samples when you shouldn’t – if you’re calculating velocity from the raw RMF SMF fields (as our code does). I’ll call this calculation using the overall Using and Delay buckets the Headline Velocity Calculation.
I thought this would be useful for figuring out if I/O Priority Management is enabled. In fact there’s a flag for that – at the system level – but if you do the calculation by totting up the buckets you get sensible numbers for both cases: Enabled and Disabled.
I/O Priority Management can be enabled or disabled at the service class level. I don’t definitively see a flag in RMF for this at the service class level but presumably if the headline calculation doesn’t work versus totting up the individual buckets with I/O samples then the Service Class is not subject to I/O Priority Management. And the converse would be true.
Batch Samples
For Batch, the headline calculation is matched by totting up the buckets for Using and Delay, if you include QMPL in the Delay samples tally – because this represents Initiator Delay. This is sensible to include in the velocity calculation as WLM-managed initiators are, as the name suggests, managed according to goal attainment and a delay in being initiated really ought to be part of the calculation.
Equally, though, with JES-managed initiators you could get a delay waiting for an initiator. And WLM isn’t going to do anything about that.
(By the way, SMF 30 – at the address space / job level – has explicit times fields for a job getting started. The most relevant one is SMF30SQT.)
zIIP-Related Samples
I was reminded in this study that samples where the work is eligible to run on a zIIP but where it actually runs on a GCP are included in Using GCP samples. If you do the maths it works. It’s not really surprising.
This is also a good time to remind you samples aren’t time, except for CPU – which is measured and converted to samples.
An example of where this is relevant is when zIIP speed is different from GCP speed. there are two cases for this:
- WIth subcapacity GCPs – where the zIIPs are faster than GCPs.
- With zIIPs running SMT-2 – where zIIP speed is slower than when SMT is not enabled. (It might still be faster than a GCP but it might not be.)
Here, it becomes interesting to think about how you get all the sample types approximately equivalent. I would expect – in the “zIIPs are different speed from GCPs” case there might need to be some use of the (R723NFFI) conversion factor. I wouldn’t, though expect the effective speed of SMT-2 zIIPs to be part of the conversion.
But perhaps I’m overthinking this and perhaps a raw zIIP second is treated the same as a raw GCP second. And both are, of course, different to Using I/O.
Sample Frequency And Sampleable Units
WLM samples Performance Blocks (PBs). These might be 1 per address space or there might be many. CICS regions would be an example of where there are many.
I’m told PBs in a CICS region are not the same as MXT (maximum number of tasks) but could approach it if the workload in the region built up enough. This is different from what I thought.
I tried to calculate MXT from sample counts divided by the sampling interval and didn’t get a sensible estimate. Which is why I asked a few friends. You can imagine that a method of calculating MXT not requiring CICS-specific instrumentation would’ve been valuable.
Conclusion
One thing I should note in this post is that – in my experience – sampling is exact. That is to say, if you add up the samples in the buckets right you get exactly the headline number. Exactness is valuable in that it gives you confidence in your inferences. Inexactness could still leave you wondering.
Most people don’t get into the raw SMF fields but if you do:
- You can go beyond what eg RMF reports give you.
- You get a much better feel for how the data (and the reality it describes) actually works.
But, as with the CICS MXT case, you can get unexpected results. I hope you (and I) learn from those.
One thought on “More On Samples”