Corroboration Not Correlation

(Originally posted 2016-08-14.)

This is a post where I have, yet again, to be careful to obfuscate the customer’s situation; I’ve no wish to embarrass them. So you’ll forgive me if there are no numbers. But there is a lesson worth sharing here. So I’m going for it…

It’s about DB2 and Workload Manager.1

I was recently asked to explain why an application’s DB2 Accounting Trace was showing so much Not Accounted For Time2 (NAT). Willie Favero discussed this here, essentially pointing to this IBM Technote.

There are a few things I’d pull out from this document:

  1. It’s part of DB2 Class 2 time – so when DB2 is supposed to be in control.

  2. The main causes are CPU Queuing and Paging. But there are a lot of others.

  3. It talks about NAT usually being small but I’d have an open mind about that. My experience is it is often quite large.

Point 2 is worth exploring in this case:

The umpteen others are generally not the cause of NAT, so I tend to advise customers to concentrate on CPU Queuing and Paging as potential causes.

So, while discussing this with the customer, the following occurred to me:

Let’s look at this from a WLM point of view

Before we go too far with this, it’s important to understand where DB2 work gets classified in WLM terms.

While there is some work that gets classified as DB2 – the subsystem address spaces in their Service Classes – the vast majority of DB2 work runs with the Service Class (and Dispatching Priority) the original work was classified with. For example:

  • CICS transactions with the CICS goal for their region (or one derived from the Transaction ).
  • DDF work classified via its own rules – into Enclaves in the DB2 DIST address but still not with DIST’s Service Class / Dispatching Priority.3

So, the point of this post is to make the linkage between WLM Goal Attainment and DB2 NAT.

To keep this simple – and the actual customer case looks like this – let’s assume we’re talking about a CICS application with regions classified with Region goals, going against a DB2 subsystem.

Region goals are Velocity goals, which makes the following make sense…

Suppose the Velocity goal is Importance 2, Velocity 60%.4

Given velocity attainment is

you could have quite a lot of Delay For CPU samples and still make the goal. So long as there were no other Delay samples, such as Delay For I/O.

And, you probably guessed this part, this level of Delay For CPU is going to appear as some level of NAT.

Corroboration Not Correlation

At this point I flatter myself to think you’ve been wondering where the title comes from. šŸ™‚

So let’s get to it…

I don’t think you can take the WLM view (from RMF Workload Activity Report / Data) and use the numbers therein to derive Not Accounted Time (NAT). So you won’t get Correlation.

But I think you will get Corroboration: A large amount of WLM Delay For CPU will probably happen at the same time as a large amount of NAT.

And that’s really all that’s needed.

To finish this off, let’s look at some wrinkles:

  • There are other Delay sample types, such as Delay For I/O, that aren’t related to NAT. (Paging, however, is related to it.)
  • It might be difficult to summarize DB2 Accounting Trace over any given WLM Service Class. Note: Apart from DDF the 101 record doesn’t contain the WLM Service Class.
  • Delay For CPU might hit other things, such as non-DB2 CICS transaction processing.
  • Likewise the non-DB2 portion of a DB2 / CICS transaction, where it would show up in Class 1 minus Class 2 time.

So, this was an interesting question to be dealing with but it’s not entirely “clean”. The upshot, however, is that if you see lots of Not Accounted For Time in DB2 Accounting Trace it’s worthwhile looking at the WLM (or even System) perspective.

And we’re definitely in the Corroboration not Correlation space, and certainly not Causation.

  1. Which is, of course, a perennial topic.

  2. Also Known As “Unaccounted For Time” or, in one of our reports, “Other Wait”. I think I’ve discussed some of this before.

  3. You’ll notice I’ve used Dispatching Priority (DP) twice now. That’s deliberate as z/OS still uses DP to manage access to CPU; It’s just the externals are through WLM in support of its goals, rather than IPS.

  4. Without getting into how you should set up WLM let me just say this is not unreasonable.

Published by Martin Packer

I'm a mainframe performance guy and have been for the past 35 years. But I play with lots of other technologies as well.

One thought on “Corroboration Not Correlation

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: