A Note On Velocity

(Originally posted 2015-12-07.)

Not to be confused with Notational Velocity.

A recent customer situation reminded me of how our code calculates velocity. It’s worth sharing with you.

The standard way of calculating velocity is to compute

(Using Samples)/(Using Samples + Delay Samples)

and convert to a percentage by multiplying by 100.[1]

The numbers are all recorded in SMF Type 72 Subtype 3.

We have two main graphs associated with Velocity for a single service class period:

• How the velocity attained varies with the amount of CPU in the service class period.
• What the Delay Samples and Using Samples are, by time of day, for the service class period.

You would expect the two graphs to agree – with the Using Samples as a proportion of the whole similar to the velocity data points. Indeed I hadn’t questioned that until this situation.

The surprise was that the Using Samples suggested a far higher velocity than that we computed. In detail, the Using Samples were dominated by Using I/O.[2]

The surprise was only momentary because our reporting also tells us that in this sysplex I/O Priority Management is disabled. This is unusual in my experience and one implication is that neither Using I/O nor Delay For I/O samples are included in the velocity calculation.

So why did my velocity calculation work? It’s because we use two key fields in the SMF 72–3. They are the headline Using (R723CTOU) and Delay (R723CTOT) Sample counts – which reflect how WLM itself calculates velocity. We don’t use the individual Delay an Using sample counts e.g Delay For CPU (R723CCDE) or Using zIIP (R723SUPU) in the velocity calculation.

A few things flow from this:

• We could produce “With I/O Samples” and “Without I/O Samples” velocity calculations and use them to guide customers in adjusting their goals.
• We could tally up Using and Delay samples and compare to the headline counts. This way we can see how complex things like zIIP samples play.

But those ideas are for another day or, more likely, another year (it being December now).

But let’s look at a worked (real) example. This is summing over 1 hour for the “DB2STC”[3] service class for 1 system.

The headline sample counts in that hour are:

Category Samples
Using 1101
Delay 1349
Idle 235912
Unknown 28571

If you calculate the velocity it’s about 45%. Also Using + Delay is about 6%, fairly typical for this kind of work, the vast majority being Idle.

Breaking down Using and Delay samples, using the explicit fields in 72–3:

Category Samples
Using CPU 928
Using zIIP 173
Delay CPU 1200
Delay zIIP 144
Delay For Swap In 5

The above doesn’t include Using I/O and Delay for I/O but the samples included do add up to the headline numberss. I’ve also excluded any zero-value counts, including “Using zIIP on CP”.

Now here are the I/O related sample counts:

Category Samples
Using I/O 14715
Delay for I/O 289

If these samples are added in the resulting velocity is 91%. In fact the goal is Importance 1, Velocity 70% – so the goal would be easily met if I/O Priority Management were enabled.

But that doesn’t necessarily mean better performance: Up to a point CPU queuing would be masked by the very strong Using I/O component. But a revised goal of, say, Importance 1 Velocity 90 with I/O in might be better.

Food for thought.

1. Unknown Samples and Other Samples, while recorded by RMF, are not used in the calculation.  ↩

2. Delay For I/O Samples were minimal.  ↩

3. What’s in a name? It turns out this service class provably (from SMF 30, as we always do) contains the MSTR, DIST and DBM1 address spaces for the customer’s Production DB2 on this system.  ↩