Reporting For Duty?

I’m writing this on a flight to Munich, where I’m presenting Parallel Sysplex Resiliency at a customer conference. By the way I wonder what happened to the word “resilience” and what the difference is between that and “resiliency”. But, it’s a trip to a nice city and I expect to run into lots of friends there. And I’m looking forward to presenting.

In this post I want to discuss report classes. In particular the approach one might take to defining them.

Report Classes Are Cheap And Abundant

Unlike with service classes, you can have practically as many as you like. There is no discernible cost to having more. Except for one thing that is, I hope you’ll agree, an upside: Just as RMF will report service class period attainment, so too with report classes. So you get more SMF data written – but it is valuable data.

Most customers are collecting SMF 72-3 so there’s nothing to do to get report class data – except define some report classes. (The mechanics of doing so, whether using z/OSMF or ISPF panels, is beyond the scope of this post.)

One other thing on cheapness: SMF 72-3 is much cheaper to collect and store than SMF 30 address space data. And can in many aspects perform the same role. Which is a key advantage.

So, if they’re so good let’s think about defining some.

Coverage

One thing I like to see is all the work in a system having a report class defined. From an instrumentation point of view it’s a second coverage of the work, alongside report classes. All work has a service class but not all work has to have a report class. But ideally it should. Hence my use of the term “coverage”.

All CPU that can be fairly associated with a service class is. Of course, not all can. Hence the existence of “uncaptured time” from which one can compute a “capture ratio”. This applies to both general purpose CPU (GCP) and zIIP.

A more interesting case, though, is memory. So let’s use it as our measure of coverage – at least for the purposes of this post.

We define memory usage by a report class or service class as SMF 72-3 field R723CPRS divided by the summarisation interval. (If you do this for a period longer than the interval you will need to sum the denominator and the numerator before dividing.) There is some adjustment required to turn the result into MB or GB.

Here are a couple of examples – from different customers.

I’ve graphed two things on the one graph:

  1. The total service class view of memory – as a line.
  2. The report class view of memory – as a stack.

To make the graph readable I only plot the top 15 report classes individually. The remainder I roll up. I’d be surprised if there were much in the “other” category.

So let’s look at an example where there is good agreement between report class memory and service class. Here the service class line overlays the top of the stack.

And here’s an example where the report class coverage is very poor, relative to the service class view.

By the way, I’ve recently come across a customer with no report classes.

Granularity

Suppose you have good coverage by report classes. That can be achieved without yielding much benefit.

If you have very few report classes but between them they sum up to the service class view that doesn’t help much. Sometimes customers define report classes for aggregating service classes. I would hope any reporting tool could do the aggregation for you. I consider this to be a missed opportunity.

I’d rather see report classes used to break down service classes. I think this was the original WLM intention and this is perhaps why the limit is so high.

You could use report classes to keep track of memory used by a bunch of cloned CICS regions, for example. For this to be useful they wouldn’t be all the regions in a specific service class. I suppose you could track individual regions this way, too.

And you might well use report class SMF 72-3 for just such a purpose: The above (R723CPRS) formula is much more accurate than what SMF 30 currently has.

Another example might be to tally all the CPU used by jobs in a particular job class. This is especially useful where multiple job classes share the same service class – as is almost universal.

Equally, you might break out individual address spaces from SYSTEM. Particularly those, such as XCF, that start too early to yield SMF 30 intervals records.

One quite common case is aggregating address spaces for a Db2 subsystem. Here the IRLM address space ought to be in SYSSTC and the other “Db2 Engine” address spaces in a notional “STCHI” service class. You might well combine the two.

A Caution On Memory Reporting

There’s something else worth mentioning: The above are standard graphs we call “PM2205”, relying only on SMF 72-3. I didn’t show you one we call “PM2200”.

As I alluded to above, not all memory is captured for a report (or service) class. For example, common areas and memory for logically swapped address spaces. (The latter mostly affects TSO and Batch – and logically swapped address spaces consume memory but not service.)

So PM2200 has an additional job to do: Working to ensure all allocated memory is in the stack up; PM2205 doesn’t as it would get too busy if it also had eg CSA in. By the way, you get the common area storage from a combination of SMF 71 (Paging Activity) and 78-2 (Virtual Storage Activity) data.

In PM2200 we also subtract the total memory usage – from whatever source – from SMF 71’s total memory usage. Unimaginatively we call it “other” and usually it’s quite small.

One other thing PM2200 does – as it uses SMF 71 – is relate all the above to the amount of memory online to the LPAR. (No RMF data shows anything above activated LPARs., though it does speak to the freshly important Virtual Flash Memory (VFM).)

Conclusion

I would like installations to think about their use of report classes – to make sure they are truly useful. Many of the things you can do with SMF 30 you can do more readily with the right set of report classes. I am keen on customers learning how to get the full value out of SMF 30 but often SMF 72-3 does the job just as well, if not better.

So I’d be keen for customers to collect SMF 30 interval records – which most of them already do. You just don’t always have to process them to get what you want.

And, as this post majored on memory as an example of the value, I’d like us all to continue to evolve our reporting. PM2205 was certainly a recent evolution in our code.

And – the overall message of this post – do think carefully about your report class structure.

Published by Martin Packer

I'm a mainframe performance guy and have been for the past 35 years. But I play with lots of other technologies as well.

One thought on “Reporting For Duty?

Leave a comment