(Originally posted 2019-06-01.)
A rather obscure pun, but I hope it’ll make sense. Not “Pi” by the way but “PI”. Though this post contains arithmetic, it’s not a mathematics post.
To be honest, I never knew how we calculated Performance Index (or PI) for Percentile goals before. But now I do, so I’m sharing it with you. Plus a couple of observations, too.
To be even more honest, when I say “I never knew how we calculated Performance Index” I should say “I never knew how we should calculate Performance Index” – as I’ve just corrected it.
(This post follows on (after 6 years) from WLM Response Time Distribution Reporting With RMF. More on that later.)
Before we go any further I have to give a little background information.
What Are Percentile Goals?
When Workload Manager (WLM) manages transactions explicitly two kinds of goals become available:
- Average Response Time
By the way, for WLM to manage transactions at all requires cooperation / exploitation by middleware or at any rate a work manager. Examples include:
- CICS transactions require CICS support
- DDF transactions require Db2 support
- TSO uses the traditional transaction ending mechanism
- Likewise Batch job steps
Average response time goals say something like “the average response time for this service class period should be 0.5 seconds”.
A Percentile response time goal might be “90% of all transactions in this service class period should finish in 300 milliseconds”.
What Is Performance Index – Or PI?
Performance Index (PI) is a measure of goal attainment.
- A PI of 1.0 is where the goal is just met.
- Higher than 1.0 and it is missed, the further away from 1.0 the worse the miss.
- Lower than 1.0 and it is met, the lower the greater ease in meeting the goal.
The point about PI is that it is a metric for goal attainment that is neutral with regard to workload type.
PI is, of course, used to drive WLM’s algorithms. But I regard it as just the first metric. Others, such as WLM’s ability to help a service class period, are important too.
How Do We Calculate PI For Percentile Goals?
The calculation for goal attainment for Average response time goals is straightforward: Sum up the response times for each transaction and divide by the number of transactions ending.
The calculation for Percentile goals is more complex.
For any kind of transaction-based goal, at transaction ending WLM uses the transaction’s response time to assign it to one of 14 buckets. So WLM is counting transaction endings in these buckets.
The buckets have the following boundaries:
% Of Goal
% of Goal
The bolded values are of special significance, as we shall see.
Suppose we have a goal of “85% to complete within 0.2 seconds”. WLM knows how many transactions completed in each bucket and how many overall.
Suppose 1000 transactions completed. 85% of 1000 is 850 transactions.
Starting with Bucket 1, WLM tallies up the transaction endings until it meets 850 transactions. The upper limit of the bucket in which that happens is what determines the PI.
Suppose Buckets 1 to 3 tally up to 800 transactions and Bucket 4 contains 100 transactions. So Buckets 1 to 3 don’t meet 850 but Buckets 1 to 4 do.
Bucket 4’s upper limit is 80% of goal. So the PI is 80%/100 or 0.8.
Suppose it took Buckets 1 to 8 to reach or exceed 850. Then Bucket 7’s upper limit would be 110% and the PI would be 110%/100 or 1.1.
The code I inherited didn’t do this calculation. But now it does.
Actually the calculation is not quite that simple: If by the time we’ve tallied up buckets 1 to 13 and we still haven’t reached that 850 number we set the PI to 4.0 (which makes sense).
From the above description of how PI is calculated for percentile goals, we can observe a few things:
PI can never be greater than 4.0, no matter how widely the goal is missed.
PI can never be less than 0.5 – as Bucket 1’s maximum is 50% of goal.
A PI of 0.5 is special in that it means enough transactions ended in Bucket 1, and some could be much shorter than 50% of goal. To get further definition we’d have to calculate the average response time.
Now I calculate PI right I’m seeing 0.5 quite a bit.
If there are no transaction endings automatically the percentile goal is reached in Bucket 1 as all zero transactions ended there. So PI is meaningless with no transaction endings. so I force the PI to 0, but attempt to flag why in my reporting.
Because we’re using buckets there are only a finite number of values PI can take. Averaging over multiple RMF intervals will, of course, yield more.
Revisiting That Old Blog Post
This seems as good a place as any to follow up on WLM Response Time Distribution Reporting With RMF.
I made some refinements to the graph I showed there:
- In the post I alluded to “near misses” versus “missed by miles” – and the counterpart on the “hits” side. I did indeed define “near” as +20% (and –20%) so Buckets 7–8 (and Buckets 5–6).
- I added a green datum line for the % value in the goal.
- I also added transaction rate. My code attempts to scale so that transaction rate doesn’t look ridiculous on a 0 to 100 scale.
- I considered changing from a red-through-to-green spectrum but I think that’s less consumable. Besides, I like red/blue better.
Here is a modern case of a CICS transaction service class.
Here are some observations:
- The datum is 95% because the goal is “95% in 1 second, Importance 1”.
- A goal with “1 second” in it suggests pretty heavy CICS transactions. I’m not surprised there aren’t that many transaction endings.
- When the transaction rate is significant the red pokes down below the green datum. Not just the pale red (“Just Outside”) but the darker red (“Well Outside”). You could say the blue shrinks away from it, if you prefer.
- Not shown here but the PI is typically around 1.3.
By the way, WLM doesn’t have complete control over the response time achieved for a transaction. And that’s particularly relevant here.
This transaction goal service class is served by two region goal service classes. Both of these show almost no “Delay For X” samples. What they do have is lots of “Using I/O” and “Using CPU” samples.
So, to improve transaction response time it’s probably necessary to try:
- Cutting the transaction CPU path length.
- Reducing the I/O time by (and this is only an example) buffering the data better.
Neither of these are things WLM can do.