System z10 CPU Instrumentation

(Originally posted 2008-09-18.)

Since I got back off vacation in L’Hérault in late August I’ve been working on adding z10 support to our CPU analysis code. It’s quite a substantial set of changes – and I don’t think I’m finished yet. But I’d like to share with you what I’ve learned so far.

But first let’s briefly review what’s changed with z10. (This is a very brief review and not a tutorial on the subjects mentioned.)

  • z10 introduces a bunch of changes in the area of how upgrades – whether temporary or permanent and whether wanted or forced by circumstances – work. So we now have the notion of Permanent and Temporary capacity models (and indeed capacity values).
  • HiperDispatch is a very significant set of changes in the way PR/SM and the z/OS Dispatcher work – especially since they work together.

I’ve had data from one customer who is using Hiperdispatch for real. But already I’m seeing “behaviours”.

I would assume, by the way, that MXG already has support for the new fields and has adjusted any calculations that needed adjusting. While I follow MXG-L Listserver I don’t take more than a passing interest in MXG itself. And, also by the way, I’m talking exclusively about Type 70 Subtype 1 in this post.

Capacity Models

We now have four different models in Type 70:

  • SMF70MDL – the original model. (Prior to z990 software model (this one) was equal to the hardware model)
  • SMF70HWM – hardware model (introduced with z990 because of the book structure)
  • SMF70MPC – permanent capacity model (new with z10)
  • SMF70MTC – temporary capacity model (new with z10)

There are also three capacity ratings:

  • SMF70MCR – corresponding to SMF70MDL
  • SMF70MPR – corresponding to SMF70MPR
  • SMF70MTR – corresponding to SMF70MTR

These are all interesting in an environment where your machine configuration changes – whether through “On-Off Capacity On Demand”, “Capacity Backup”, “Capacity For Planned Events” or whatever. You can now do your usual performance and capacity work even when the configuration changes.

At this point I’m just listing the numbers in my reporting. I suspect I’ll do more when I get performance data from customers who actually do e.g. time-of-the-month upgrades/downgrades (and I know one or two who already do).

Hiperdispatch

When looking at Hiperdispatch you have to understand there are two major parts to it:

  • Dispatcher Affinity (DA) – from z/OS
  • Vertical CPU Management (VCM) – from PR/SM

Internally I still sometimes hear it using the terms DA and VCM. The point is it’s got two parts to it. So there is information in sections of the record related to z/OS and other information in sections related to PR/SM. You have to put the two together.

And here’s the most important bit…

You need to collect Type 70s from ALL z/OS images of any significance on the machine to get the full picture.

A good example of this is understanding how many logical engines are really in play when some of them are parked (in most LPARs).

z/OS – Related Information

SMF70HHF has flags for whether Hiperdispatch is supported or is active. These are, fairly obviously, for the reporting z/OS image.

SMF70PPT is the amount of time this engine was “parked” in the interval. (That is when work is deliberately not dispatched to it.) These are some or all of the “Low Polarization” engines. More on that a little later. But parked engines are important because the new calculation for CPU Busy counts parked engines as not part of the z/OS image’s capacity.

PR/SM – Related Information

SMF70POW is used to calculate the Polarization Weight for a logical engine. Logical engines are classified as High, Medium or Low. An LPAR’s weights are spread across its logical engines to ensure the High engines each have a weight corresponding to one physical engine. Each Low engine has a zero weight. Any weight left over from assigning the High weights is assigned to either 1 or 2 Medium engines. (1 if the remainder is more than half an engine, 2 if the remainder would have been less than half an engine.)

You can observe this Polarization Weight distribution using SMF70POW…

The highest value of SMF70POW for an LPAR is a High logical engine, that is 1 whole physical engine. Any values of SMF70POW smaller than that but greater than zero are for Medium logical engines. I’ve seen cases of both 1 Medium and of 2 Mediums for different LPARs on the same machine.

Bringing It All Together

So, to understand Hiperdispatch you need both LPAR and z/OS image information.

Actually, since IRD was introduced, you’ve had to marry up both perspectives. Because Online Time (in the case of Logical CP Management) became a part of the calculation. And now Parked Time is.

(After a number of years of owning our CPU Analysis code I’ve recast it – for Hiperdispatch – in a way that makes it much easier to morph our CPU calculations in case anything else happens. I’m not foretelling anything – just knowing that CPU Utilisation is one of those things whose definition will never settle for long.) 🙂

Published by Martin Packer

.

One thought on “System z10 CPU Instrumentation

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: