Really Starting Something

This post is about gleaning start and stop information from SMF – which, to some extent, is not a conventional purpose.

But why do we care about when IPLs happen? Likewise middleware starts and stops? Or any other starts and stops?

I think, if you’ll pardon the pun, we should stop and think about this.

Reasons Why We Care

There are a number of reasons why we might care. Ones that come immediately to mind are:

  • Early Life behaviours
  • System Recovery Boost and Recovery Process Boost
  • PR/SM changes such as HiperDispatch Effects
  • Architectural context

There will, of course, be reasons I haven’t thought of. But these are enough for now.

So let’s examine each of these a little.

Early Life Behaviours

Take the example of a Db2 subsystem starting up.

At very least its buffer pools are unpopulated and there are no threads to reuse. Over time the buffer pools will populate and settle down. Likewise the thread population will mature. When I’ve plotted storage usage by a “Db2 Engine” service class I’ve observed it growing, with the growth tapering off and the overall usage settling down. This usually takes days, and sometimes weeks.

(Parenthetically, how do you tell the difference between a memory leak and an address space’s maturation? It helps to know if the address space should be mature.)

Suppose we didn’t know we were in the “settling down” phase of a Db2 subsystem’s life. Examining the performance data, such as the buffer pool effectiveness, we might draw the wrong conclusions.

Conversely, take the example of a z/OS system that has been up for months. There is a thing called “a therapeutic IPL”. Though z/OS is very good at staying up and performing well for a very long time, an occasional IPL might be helpful.

I’d like to know if an IPL was “fresh” or if the z/OS LPAR had been up for months. This is probably less critical than the “early life of a Db2” case, though.

System Recovery Boost and Recovery Process Boost

With System Recovery Boost and Recovery Process Boost resource availability and consumption can change dramatically – at least for a short period of time.

In SRB And SMF I talked about early experience and sources of data for SRB. As I said I probably would, I’ve learnt a little more since then.

One thing I’ve observed is that if another z/OS system in the sysplex IPLs it can cause the other systems in the sysplex to experience a boost. I’ve seen time correlation of this effect. I can “hand wave” it as something like a recovery process when a z/OS system leaves a sysplex. Or perhaps as a Db2 Datasharing member disconnects from its structures.

Quite apart from catering for boosts, detecting and explaining them seems to me to be important. If you can detect systems IPLing that helps with the explanation.

PR/SM Changes

Suppose an LPAR is deactivated. It might only be a test LPAR. In fact that’s one of the most likely cases. It can affect the way PR/SM behaves with HiperDispatch. Actually that was true before HiperDispatch. But let me take an example:

  • The pool has 10 CPs.
  • LPAR A has weight 100 – 1 CP’s worth.
  • LPAR B has weight 200 – 2 CP’s worth.
  • LPAR C has weight 700. – 7 CP’s worth.

All 3 LPARs are activated and each CP’s worth of weight is 100 (1000 ÷ 10)

Now suppose LPAR B is deactivated. The total pool’s weight is now 800. Each CP’s weight is now 80 (800 ÷ 10). So LPAR A’s weight is 1.25 CP’s worth and LPAR C’s is 8.75 CP’s worth.

Clearly HiperDispatch will assign Vertical High (VH), Vertical Medium (VM), and Vertical Low (VL) logical processors differently. In fact probably to the benefit of LPARs A and C – as maybe some VL’s become VM’s and maybe some VM’s become VH’s.

The point is PR/SM behaviour will change. So activation and deactivation of LPARs is worth detecting – if you want to understand CPU and PR/SM behaviour.

(Memory, on the other hand, doesn’t behave this way: Deactivate an LPAR and the memory isn’t reassigned to the remaining LPARs.)

Architectural Context

For a long time now – if a customer sends us SMF 30 records – we can see when CICS or IMS regions start and stop.

Architecturally (or maybe operationally) it matters whether a CICS region stops nightly, weekly, or only at IPL time. Most customers have a preference (many a strong preference) for not bringing CICS regions down each night. However, quite a few still have to. For some it’s allowing the Batch to run, for a few it’s so the CICS regions can pick up new versions of files.

Less importantly, but equally architecturally interesting, is the idea that middleware components that start and stop together are probably related. Whether they are clones, part of the same technical mesh, or business wise similar.

How To Detect Starts And Stops

In the above examples, some cases are LPAR (or z/OS system) level. Others are at the address space or subsystem level.

So let’s see how we can detect these starts and stops at each level.

System-Level

At the system level the best source of information is RMF SMF Type 70 Subtype 1.

For some time now 70-1 has given the IPL date and time for the record-cutting system (field SMF70_IPL_TIME, which is in UTC time). As I pointed out in SRB And SMF, you can see if this IPL (and the preceding shutdown) was boosted by SRB.

LPAR Activation and Deactivation can also, usually, be detected in 70-1. 70-1’s Logical Processor Data Section tells you, among other things, how many logical processors this LPAR has. If it transits from zero to more than zero it’s been activated. Similarly, if it transits from more than zero to zero it’s been deactivated. The word “usually” relates to the fact that the LPAR could be deactivated and then re-activated in an RMF interval. If that happens my code, at least, won’t notice the bouncing. This isn’t, of course the same as an IPL – where the LPAR would remain activated throughout.

The above reinforces my view you really want RMF SMF from all the z/OS systems in your estate, even the tiny ones. As that way you’ll see the SMF70_IPL_TIME values for them all.

Subsystem-Level

When I say “Subsystem Level” I’m really talking about address spaces. For that I would look to SMF 30.

But before I deal with subsystems I should note an alternative way of detecting IPLs: Reader Start Time in SMF 30 for the Master Scheduler Address Space is within seconds of an IPL. Close enough, I think. This is actually the method I used in code written before the 70-1 field became available.

For an address space, generally you can use its Reader Start Time for it coming up. (Being ready for work, though, could be a little later. This is also true, of course, for IPLs. And SMF won’t tell you when that is. Likewise for shutting down.) You could also use the Step- and Job-End timestamps in SMF 30 Subtypes 4 and 5 for when the address space comes down. In practice I use Interval records and ask of the data “is the address space still up?” until I get the final interval record for the address space instance.

When it comes to reporting on address space up and down times I group them by ones with the same start and stop times. That way I see the correlated or cloned address spaces. This is true for both similar address spaces (eg CICS regions) and dissimilar (such as adding Db2 subsystems into the mix).

Conclusion

As I hope I’ve shown you, there are lots of architectural and performance reasons why beginnings and endings are important to detect. I would say it’s not just about observation; It could be a basis for improvement.

As I also hope I’ve demonstrated, SMF documents such starts and stops very nicely – if you interrogate the data right. And a lot of my coding effort recently had been in spotting such changes and reporting them. If I work with you(r data) expect me to be discussing this. For all the above reasons.

SRB And SMF

I’ve just had my first brush with SMF from z15’s System Recovery Boost (SRB).

(Don’t blame me for the reuse of “SRB”.) 🙂

The point of this post is to share what I’ve discerned when processing this SMF data.

System Recovery Boost

To keep the explanation of what it is short, I’ll say there are two components of this:

  • Speed Boost – which enables general-purpose processors on sub-capacitymachine models to run at full-capacity speed on LPARs being boosted
  • zIIP Boost – which enables general-purpose workloads to run on zIIP processors that are available to the boosted LPARs

And there are several different triggers for the boost period where these are available. They include:

  • Shutdown
  • IPL
  • Recovery Processes:
    • Sysplex partitioning
    • CF structure recovery
    • CF datasharing member recovery
    • HyperSwap

The above are termed Boost Classes.

If you want more details a good place to start is IBM System Recovery Boost Frequently Asked Questions

I’ve bolded the terms this post is going to use.

The above-mentioned boosts stand to speed up the boosted events – so they are good for both performance and availability.

RMF SMF Instrumentation For SRB

If you wanted to detect when a SRB boost had happened and the nature of it you would turn to SMF 70 Subtype 1 (CPU Activity).

There are two fields of interest here:

  • In the Product Section field SMF70FLA has extra bits giving you information about this system’s boosts.
  • In the Partition Data Section field SMF70_BoostInfo has bits giving you a little information about other LPARs on the machine’s boosts.

It should also be noted that when a boost period starts the current RMF interval stops and a new one is started. Likewise when it ends that interval stops and a new one is started. So you will get “short interval” SMF records around the boost period. (In this set of data there was another short interval before RMF resynchronised to 15 minute intervals.) Short intervals shouldn’t affect calculations – so long as you are taking into account the measured interval length.

After a false start – in which I decoded the right byte in the wrong section 🙂 – I gleaned the following information from SMF70FLA:

  • Both Speed Boost and zIIP Boost occurred.
  • The Boost Class was “Recovery Process” (Class 011 binary).
    • There is no further information in SMF 70-1 record as to which recovery process happened.

From SMF70_BoostInfo I get the following information:

  • Both Speed Boost and zIIP Boost occurred – for this LPAR.
  • No other LPAR on the machine received a boost. Not just in this record but in any of the situation’s SMF data.

The boost period was about 2 minutes long – judging by the interval length.

Further Investigations

I felt further investigation was necessary as the type of recovery process wasn’t yielded by SMF 70-1.

I checked SMF 30 Interval records for the timeframe. I drew a blank here because:

  • No step started in the appropriate timeframe.
  • None of the steps running in this timeframe looked like a cause for a boost.

I’m not surprised as the types of boost represented by Recovery Process really should show up strongly in SMF 30.

One other piece of evidence came to light: Another LPAR in the Sysplex (but not on the same machine) was deactivated. It seems reasonable to me that one or other of the Recovery Boost activities would take place in that event.

Conclusion

While I do think z15 SRB is a very nice set of performance enhancements, I do think you’re going to need to cater for it in your SMF processing for a number of reasons:

  • It’s going to affect intervals and their durations.
  • It’s going to cause things like speed changes on subcapacity GCPs and also zIIP behaviours.
  • A boosted LPAR might compete strongly with other (possibly Production) LPARs.
  • It’s going to happen, in all probability, every time you IPL an LPAR.

That last says it’s not “exotica”. SRB is the default behaviour – at least for IPL and Shutdown boost classes.

As I’ve indicated, SMF 70-1 tells most of the story but it’s not all of it.

There is one piece of advice I can give on that: Particularly for Recovery Process Boost, check the system log for messages. There are some you’re going to have to automate for anyway.

One final point: A while back I enhanced my “zIIP Performance And Capacity” presentation to cover SRB. I’ll probably refine the SRB piece as I gain more experience. Actually, even this blog post could turn into a slide or two.

Clippy? Not THAT Clippy!

A recent episode of the Mac Power Users Podcast was a round up of clipboard managers – on iOS and Mac OS. You can find it here.

There was a subsequent newsgroup discussion here which I’ve been participating in.

There are lots of things I want from a clipboard manager. They include:

  1. It to keep a history.
  2. It to enable the text (and, I suppose, images) to be readily processed.
  3. It to sync between devices.

Needs 1 and 3 are readily met by a lot of clipboard managers – as the podcast episode illustrated. Item 2, though, has always intrigued me.

This post deals with that topic – with an experiment using two clipboard managers:

I write in Markdown – a lot. It’s a nice plain text format with lots of flexibility and a simple syntax. So it’s natural my two examples are Markdown-centric.

They’re not identical cases – with the Paste / Shortcuts example being a simpler example.

In both cases, though, the key feature is the use of the clipboard history to “fill in the blanks”. How they approach this is is interesting – and both take a fairly similar approach.

Paste and Shortcuts – A Simple Case

Paste is an independently-developed app that runs on both Mac OS and iPad OS / iOS, with the copied items being sync’ed between all your devices.

It maintains a clipboard history – which this experiment will use.

Shortcuts started out on iOS as Workflow. Apple bought the company and made it a key component of automation on iOS, iPad OS, and (with Monterey) Mac OS. In principle it’s simple to use.

So here’s a simple example. It takes the last two clipboard entries and makes a Markdown link out of them. The link consists of two elements:

  • In square brackets the text that appears.
  • In round brackets the URL for the link.

The following shortcut has 4 steps / actions:

  1. Retrieve the last clipboard item (Position 1).
  2. Retrieve the previous clipboard item (Position 2).
  3. Create text from these two – using a template that incorporates the square and round brackets.
  4. Copy this text to the clipboard.

You compose the shortcut by dragging and dropping the actions.

Here is the shortcut.

There’s a problem in the above in seeing which clipboard item is in which position in the template. On Mac OS clicking on the ambiguous item leads to a dialog:

Click on Reveal and the source of the variable is revealed – with a blue box round it:

Obviously, copying to the clipboard in the right order is important and the above shows how Shortcuts isn’t that helpful here. I suppose one could detect a URL and swap the two clipboard items round as necessary. But that’s perhaps a refinement too far.

I actually developed this shortcut on Mac OS – but I might have been better off doing it on iPad OS. I don’t find the Mac OS Shortcuts app a nice place to develop shortcuts. (Sometimes – but not this time – it’s advisable to develop on the platform the actions are specific to.)

Keyboard Maestro – A Slightly More Complex Case

Keyboard Maestro isn’t really a clipboard manager – but it has lots of clipboard manipulation features. It only runs on Mac OS and is a very powerful automation tool. which makes it ideal for the purposes of this discussion.

In a similar way to Shortcuts, you compose what is called a macro out of multiple actions – using drag and drop.

Here’s the macro:

The first action fills out the template with the last two clipboard items – copying the result back to the clipboard. It’s more direct than the Shortcuts example. Plus, it’s clearer which clipboard item is plugged in where. (The tokens %PastClipboard%0% and %PastClipboard%1% are the latest clipboard item and the one before it – and I like the clarity of the naming.)

The second action activates the Drafts application – which the Shortcuts example didn’t do.

Then it pastes the templated text into the current draft. Again the Shortcuts example didn’t do that.

This, for me, is a more useful version of the “compose a Markdown link” automation. The only downside for me is that it doesn’t run on iPad OS or iOS. But then I do most of my writing on a Mac anyway.

Conclusion

It’s possible, with these two apps at least, to get way beyond just accessing the last thing you copied to the clipboard. (The built-in clipboard capabilities of the operating system won’t get you previous clipboard items.)

Both Shortcuts and Keyboard Maestro are automation tools – so experimenting with them yields a useful conclusion: There can be more value with using the clipboard when you use it with automation.

It should be noted that you needn’t use the clipboard at all if you automate the acquisition of the data. This is true of both Shortcuts and Keyboard Maestro. Both are perfectly capable of populating variables and using them.

However, when it comes to selecting text and using it in a template, user interaction can be handy. And sometimes that needs a clipboard – as the user gathers data from various sources.

This pair of experiments illustrates that approach.

One final note: I haven’t shared editable shortcuts or Keyboard Maestro macros as

  1. These are so very simple you could more readily create them yourself.
  2. You’re going to want to edit them beyond recognition – unless they exactly fit your need.

The point is to encourage you to experiment.

Anatomy Of A Great App

This post follows on from Anatomy Of A Great iOS App.

That post was only written in 2019 but such a lot has changed in the Apple ecosystem that I think it worth revisiting. A hint at what’s changed is that the title of this post doesn’t contain the word “iOS” anymore.

(I can’t insert the word “Apple” into the title as the vast majority of the relevant apps aren’t made by Apple.)

As with that post, I don’t consider this one to be a complete treatment of what the ideal app would do. Just some pointers to the most important things. (And important is highly subjective, particularly as I consider myself a power user.)

To reprise a list in that post, with some updates, there are obvious personal biases here:

  • Automation is still important to me.
  • I have most of the Apple ecosystem – and now I have 3 HomePod speakers, scattered about the house.
  • I really want good quality apps – and I am willing and able to pay for them. (and, as in the case of OmniFocus 4, risk all by beta’ing them.) 🙂

Other themes are emerging:

  • Apps should be cross platform – where possible.
  • Mac Apps should support Apple Silicon.
  • Terms and conditions should be helpful.

All of the above are about user experience and value. So let’s take them one at a time.

Cross Platform

The tools and techniques increasingly support writing for all platforms with a single code base. Maybe with a small amount of platform-specific code.

From a vendor point of view this increases their market. You might say the Mac market, for instance, is small compared to the iPhone or Windows market. But only a small portion of iPhone users are into paying for apps, at least productivity apps. So the Mac market is a substantial proportion of that sub-market – and so probably worth catering for.

From a user point or view there are benefits, too: Portability of data and application experience are important. For example, I’m composing this blog post on an iPhone (in interestitial moments), on my iPad with a Magic Keyboard, and on Mac. The app I’m using is the very excellent Drafts – which has common automation across all platforms. (I might’ve started this post by dictating to my Apple Watch – using Drafts – but I didn’t.)

My task manager, OmniFocus, has similar cross-platform portability of data, automation, and (in the latest beta) experience. That again makes it attractive to me.

Both the mind mapping tools I use – MindNode and iThoughts – are cross platform.

Note: I don’t want all the platforms to merge – as there are use cases and capabilities unique to each, such as Apple Pencil. Or Apple Watch.

Apple Silicon

It’s important to note that Apple Silicon has the same programming model – at the machine code / assembler level – as iPhones and iPads have always had. Indeed the iPad I’m typing this on has the same M1 processor as the first Apple Silicon Macs. (It’s all based on ARM – which I enjoyed programming in assembler for in the late 1980’s.)

Building for Apple Silicon yields tremendous speed and energy efficiency advantages – which the consumer would greatly appreciate. It also makes it easier to build cross-platform apps.

While applications that are built for Intel can run using Rosetta 2, that really isn’t going to delight users. Apps really should be native for the best performance – and why would you want anything else?

Terms And Conditions Apply

As with z/OS software, the model for paying for software has evolved.

There are a couple of things I rather like:

  • Family Sharing
  • Universal Licencing

By the way, it seems to me to be perfectly OK to use free apps in perpetuity – but the developer generally has to be paid somehow. So expect to see adverts. I view free versions as tasters for paid versions, rather than tolerating adverts.

Family sharing allows members of your family (as defined to Apple) to share apps, iCloud storage, etc. That includes in-app purchases. But the app has to enable it. It’s an economic decision but it does make an app much more attractive – if you have a “family” that actually wants to use it. (I have a family but the “actually wants to use it” bit is lacking.)

Universal Licencing is more of a developer-by-developer (or app-by-app) thing to enable. It’s attractive to me to have a licence that covers an app from Watch OS, through iOS and iPad OS, all the way to Mac OS. It means I can experiment with where to run something.

I would couple both the above licencing schemes to Subscriptions – where you pay monthly or annually for the licence. Some people dislike subscriptions but I’m OK with them – as I know the developer needs to be paid somehow. The link is that I won’t rely on apps that either aren’t developed or, more seriously, where the developer isn’t sustainable. So to recommend an app to a family member it has to meet that criterion. Likewise to bother using it on all the relevant platforms.

Conclusion

One controversial point is whether the apps have to be Native. So, many people don’t like apps built with Electron (a cross-platform framework that doesn’t work on iOS or iPad OS or (probably) Android). To me, it’s important how good an app is more than what it’s built with – though the two are related. And “good” includes such things as not being memory hogs, supporting Shortcuts and / or AppleScript, and supporting standard key combinations.

Mentioning Shortcuts in the previous paragraph, I would note the arrival of Shortcuts on Mac in Monterey. I’ve used this – as highlighted in Instant Presentations?. While functional, the app itself is awkward to use on Mac – so I recommend composing on iOS or iPad OS to the extent possible. With iCloud sync’ing the resulting shortcut should transfer to Mac. But even on iOS and iPad OS the Shortcuts experience is currently (November 2021) buggy. I expect it to improve.

One final thought: Running through this post and the previously-referenced one is a theme: The thoughtful adoption of new features. Examples include:

  • Shortcuts – if automation is relevant.
  • SharePlay – if the idea of sharing over FaceTime is meaningful.

Not to mention Safari Extensions and Widgets.

The words “relevant” and “meaningful” being operative here. It’s not up to me as a user to assess relevance or meaningfulness – but it is up to users to use their ingenuity when adopting apps.

And when I say “thoughtful adoption” that applies to users as well. There are many new capabilities in modern releases of iOS , iPad OS, and Mac OS. I would single out two very recent ones:

  • Live text – which recognises text in photographs and lets you do something useful with it. I’ve found this to work very well.
  • Quick Notes – though the target being Apple Notes is less useful to me. (I’m approximating it with automation for capture to Drafts.)

So I encourage users to explore new operating system releases, rather than just bemoaning the need to upgrade.

Instant Presentations?

For normal people making a presentation is as simple as opening PowerPoint and starting typing.

But I’m not normal. 🙂

I would like to start with a mind map and end up with a presentation – with Markdown as my preferred intermediary.

It doesn’t matter what the presentation is. Mine tend to be in one of the following categories:

  • Something incredibly informal, perhaps to share with colleagues.
  • A conference presentation.
  • Workshop materials.

And sometimes the material isn’t going to be a presentation. Or at any rate not just a presentation. Hence Markdown being my preferred medium – as it can be readily converted to eg HTML.

And I want a low friction way of creating a presentation the way I like to create it.

The Workflow I Was Aiming For

The ideal steps go something like this:

  1. Have an idea.
  2. Create a mind map with the idea in.
  3. Work on the mind map until it’s time to flesh out some slides.
  4. Generate the skeleton Markdown for the slides.
  5. Flesh out the slides.
  6. Create the presentation – as readable by PowerPoint.

Steps 5 and 6 are, of course, iterative.

How I Built The Tooling For The Workflow

Steps 1 – 3 are all in my head or using MindNode, a very nice multi-platform mind mapping tool.

Steps 5 and 6 are:

  • Use a text editor to work on the Markdown (with some extensions)
  • Use my mdpre and md2pptx open source tools – via the make utility – to generate pure Markdown and convert it to PowerPoint .pptx format.

Decent text editors and make enable those last two steps to be quick and frictionless.

Step 4 is the interesting one. Let’s look at it in more detail – including how it’s done:

  • I wrote a Shortcuts shortcut – that could run on iOS, iPad OS, or Mac OS (as of Mac OS 12 Monterey). It exports a mind map you select to Markdown, does a small amount of editing, and copies it to the clipboard. MindNode has a built in action to export to a number of formats, including Markdown. Which is why I’m favouring MindNode for this task.
  • I wrote a Keyboard Maestro macro that invokes the shortcut.
  • The same macro writes the Markdown out – to a directory and file you nominate.
  • It also creates a make file that invokes mdpre and then md2pptx.
  • It also copies some files – boilerplate Markdown and CSS – into place.

So, the whole thing is as automatic as it could be – with the user specifying only what they need to. And it runs very fast – and much more reliably than a human doing it.

Here is what a presentation outline looks like in MindNode.

I’ve deliberately used text that describes what the generated parts will be.

And here’s the Markdown the whole process generates.

=include metadata.md
=include standard.css

# Presentation title

## Section 1

### Slide A
* Bullet 1
* Bullet 2

### Slide B

## Section 2

### Slide C

As you can see, it’s plain text – which is what Markdown is. So you could use any text editor you like to work on this. And you can apply Git version control to it – which is often what I do for more formal presentations.

Actually the =include lines aren’t standard Markdown; They are what mdpre will recognise as instructions to include files. In this case both metadata.md and standard.css embed other files the same way.

Conclusion

One final thought: You might think that a mind map is overkill for a presentation but consider what a minimal mind map is: It’s just a single word, or maybe it’s just the title. MindNode, for one, makes it very easy to create just such a mind map. It really is very minimal indeed. And I would hope that any formal presentation would go through a structural process like mind mapping.

So, “Instant Presentation”? Well, not quite – but very close. And close enough to make it easy and quick to kick off another presentation – however formal or informal.

What’s The Use?

It’s “Sun Up” on Conference Season – and I have a new presentation.

It’s called “What’s The Use?” And it’s a collaboration with Scott Ballentine of z/OS Development.

It’s very much a “field guy meets product developer” sort of thing. It emerged from a conversation on IBM’s internal Slack system.

The idea is very simple: If a product developer codes the IFAUSAGE macro right, and if a customer processes the resulting SMF 89-1 and SMF 30 records right, good things can happen.

Two “ifs” in that:

  • Scott describes how developers could and should code the macro.
  • I give some examples of how customers might use the data to their advantage.

Of course, when we say “developer” we don’t necessarily mean IBM Developer (such as Scott) – as other software vendors could and should get this right.

And when we say “customer” it could be consultants (such as me) or outsourcers, as well as traditional customers.

So what’s the big deal?

Looking at it from the customer point of view, there are a number of things that can be yielded:

  • CPU when using the product. MQ does this, for example.
  • Names of things. Db2 and MQ both do this.
  • Connectivity. Connectors to both Db2 and MQ do this. And – to a lesser extent – IMS does this.

I’ve listed IBM products – which are the ones I’m most familiar with. One thing Scott brings to the party is how the IFAUSAGE macro works and can be used. Handily, he talks through the lifecycle of using the macro and we both talk through how that turns into stuff in SMF 89-1 and SMF 30 records. The point of the lifecycle is that any vendor could use this information to be helpful to their customers.

We’d like developers to get creative in how they use IFAUSAGE – whether they use it as a basis for billing our not. (At least one famous one doesn’t.) So, a plea: If you are a developer of software that has something approaching a subsystem, consider encoding the subsystem name in Product Qualifier in IFAUSAGE. Likewise for any connector.

We now have a third public booking for this presentation (plus a private one). So I guess the presentation “has legs” – and we’ll continue to put in the effort to evolve it.

Talking of which, the presentation got its first outing yesterday and got numerous questions. One of them prompted Scott and I to discuss expanding the scope a little to cover SMF 89-2. Would that be worthwhile? (My inclination is that it would – and I already process 89-2 in my REXX so could furnish an example of what you might get.)

The abstract is in an earlier blog post: Three Billboards?.

One final note: We think this presentation could be useful enough that I think we’d be prepared to give it to smaller audiences – such as individual software developers.

Periodicity

When I examine Workload Manager for a customer a key aspect is goal setting. This has a number of aspects:

  1. How work gets classified – to which Service Class and Report Class by what rules.
  2. What the goals are for each Service Class Period.
  3. What the period boundaries should be.

This post focuses on Aspect 3: Period Boundaries.

What Is A Service Class Period?

When a transaction executes it accumulates service. Generally this is CPU time, especially with modern Service Definition Coefficients.

For transactions that support it you can define multiple Service Class Periods. Each period – except the last – has a duration.

Some transaction types, most notably CICS, only have a single period. For them the discussion of period durations is moot.

The z/OS component that monitors service consumption is System Resources Manager (SRM). SRM predates Workload Manager (WLM) by decades. (It’s important not to see WLM as replacing SRM but rather as supervising it. WLM replaces human-written controls for SRM.) Periodically SRM checks work’s consumption of resources. If the transaction has exceeded the relevant period duration the transaction moves to the next period.

It isn’t the case that a transaction using more service than its current period’s duration directly triggers period switch – so it would be normal for a (generally) slight exceeding as there is some slight latency to detection.

The purpose of multiple periods is, of course, to give good service to light consumers of service and to progressively slow down heavier consumers.

Note: A common mistake is to think that transactions fall through into later periods because of their elapsed time. They don’t; It’s about service. Granted, a long running transaction might be long running because of the CPU it’s burning. But that’s not the same thing as saying it’s the elapsed time that drove it to later periods.

Two Examples Of A New Graph

Here are two example graphs from the same customer. They are new in our code base, though Service Class Period ending proportions are something we’ve talked to customers about for many years. I’m pleased we have these as I think they will tell some interesting new stories. You’ll get hints of what I think those stories might be based on the two examples from my “guinea pig” customer.

Each graph plots transaction ending rates for each period of the Service Class across a day. In the heading is information about period boundaries and how many service units the transactions ending there consumed on average. I feel the usefulness of that latter will emerge with more experience – and I might write about it then. (And graph headings is one place my code has a high tendency to evolve, based on experiences with customers.)

Though the two examples are DDF I don’t intend to talk much about Db2 DDF Analysis Tool – except to say, used right, it would bring some clarity to the two examples.

DDFMED – A Conventional-Looking Service Class

This Service Class looks like many Production transaction service classes – with the classic “double hump” shape. I consider that an interesting – if extremely common – architectural fact. There’s something about this workload that looks reminiscent of, say, CICS transactions.

Quite a high proportion of the transactions end in Period 2 and a fair proportion in Period 3. Those in Period 3 are, on average, very heavy indeed – consuming an average of 162K service units. (This being DDF, the transaction ends when the work is committed – which might not be the end of the transaction from the client’s point of view.)

It seems to me the period boundaries are reasonable in this case, but see “Conclusion” below.

DDFLOW

This Service Class looks quite different:

  • The transaction rate is more or less constant – with two spikes, twelve hours apart. I consider both the constant transaction rate and the twelve-hourly spikes to be interesting architectural facts.
  • Almost all transactions end in Period 1. In fact well within Period 1. The very few Period 3 transactions are extremely long.

Despite the name “DDF Low” I think we have something very regular and well controlled here. I say “despite” as, generally, less well understood / sponsored work tends to be thought of as “low”.

Conclusion

I will comment that, when it comes to goal setting, business considerations play a big part. For example, some of the effects we might see at the technical level could be precisely what it needed. Or precisely what is not needed. So I tend not to walk in with recommendations for things like transaction goals – but I might walk out with them. Contrast this with what I call my “Model Policy” – which I discussed in Analysing A WLM Policy – Part 1 in 2013. Core bits of that are as close to non-negotiable as I get.

However, it is – as I think this post shows – very useful in discussions of period durations to know the proportions of transactions for a Service Class that end in each period. If everything falls through into Period 2, for example, Period 1’s duration is probably too short. And not just the proportions but the transaction rates across, say, the day.

One other thing, which I’ll leave as a question: What happens if you slow down a transaction that, say, holds lots of locks?

Four Coupling Facilities

This isn’t following on from Three Billboards? 🙂 but rather Shared Coupling Facility CPU And DYNDISP from 2016. I’m not sure it adds much that’s new but this set of customer data was an opportunity too good to miss…

… It enables me to graph most of the Coupling Facility LPAR types you’re likely to see.

I won’t repeat the contents of the 2016 post but I will repeat one thing: There are two views of Coupling Facility CPU:

  • SMF 70 Partition
  • SMF 74–4 Coupling Facility

That 2016 post talked at length about the latter. This post is more about the former.

Different Coupling Facility LPAR Types

There are different kinds of coupling facility LPAR, and this customer has several of them:

  • Dedicated
  • Shared – without Dynamic Dispatch (DYNDISP=NO)
  • Shared – with Dynamic Dispatch (DYNDISP=YES)
  • Shared – with DYNDISP=THIN

The latter two are similar but, in essence, Thin Interrupts (DYNDISP=THIN) shortens the time a CF spends polling for work, releasing the physical CPU sooner. This is good for other ICF LPARs, but maybe not so good for the LPAR with DYNDISP=THIN.

While this customer’s data doesn’t exemplify all four types it is a useful set of data for illustrating some dynamics.

About This Customer’s Mainframe Estate

I’m only going to describe the relevant bits of the customer’s mainframe estate – and I’m going to remove the names.

There are four machines, each running a mixture of LPARs in sysplexes and monoplexes. The sysplex we were most interested in had four LPARs, one on each machine. Also four coupling facilities, again one on each machine. There were no external coupling facilities.

Those of you who know a bit about resilience are probably wondering about duplexing coupling facility structures but this post isn’t about that.

I don’t think it makes any difference but these are a mix of z13 and z14 machines.

We had SMF 70-1 and SMF 74-4 from these z/OS LPARs and the four coupling facilities, but little from the others.

Here are the four machines’ ICF processor pools, across the course of a day.

The top two look significantly different to the bottom two, don’t they?

Machines A And B

These two machines have multiple ICF LPARs, each with some kind of DYNDISP turned on. We can see that because they don’t use the whole of their PR/SM shares – as their utilisation from the PR/SM point of view is varying.

Each machine has two shared ICF processors.

We have SMF 74-4 for the blue LPARs. So we can see they are using DYNDISP=THIN. We can’t see this for the other LPARs as we don’t have SMF 74-4 for them. (SMF 70-1 doesn’t have a DYNDISP indicator.)

The blue LPARs are also much busier than the other LPARs in their pool. While one might consider dedicating one of the two ICF processors we wouldn’t ideally define an ICF LPAR with a single logical processor.

Machine C

Machine C looks different, doesn’t it?

Again we have 2 physical processors in the ICF pool.

Here we have four ICF LPARs, each with DYNDISP=NO. We know three facts to establish this:

  • From SMF 70-1 we know that none of the LPARs has any dedicated ICF engines. We know this two ways:
    • As this graph shows, none of them has the share of a single ICF engine.
    • We have Dedicated and Shared processor information explicitly in SMF 70-1.
  • These LPARs use their whole share – judging by the constant CPU use in SMF 70-1.
  • From SMF 74-4 we know the blue LPAR in particular has DYNDISP=NO. (We don’t have SMF 74-4 for the others.)

Machine D

This looks similar to Machine C but it isn’t quite.

Yet again the ICF pool has 2 processors.

  • The Red LPAR has dedicated processors – from both SMF 70-1 and SMF 74-4. DYNDISP doesn’t even come into it for this LPAR.
  • The other LPARs have DYNDISP=NO, judging by their (SMF 70-1) behaviour.

A minor footnote: As I articulated in A Picture Of Dedication (in 2015) I sort the LPARs so the Dedicated ones appear at the bottom of the stack up. (Even below *PHYSICAL – which is a very thin blue veneer here but actually red in the case of the other three machines.)

But Wait, There’s More

When I started writing this post I thought it was just going to be about showing you four pretty pictures. But, partially because some blog posts get written over an extended period, something came up meanwhile that I think is worth sharing with you. Besides, electronic words are unlimited – even if your patience isn’t.

Having shown you some graphs that depict most of the ICF LPAR CPU situations I experimented with DYNDISP detection for ICF LPARs we don’t have SMF 74-4 for.

Got right, this could make the story of an ICF pool with shared logical engines much nearer to complete.

The current algorithm – now in our Production code – assumes DYNDISP if the ICF LPAR uses less than 95% of its share over the focus shift (set of hours). Otherwise it’s Dedicated or DYNDISP=NO. I still can’t tell whether it’s DYNDISP=THIN or YES without SMF 74-4.

While ideally I still want data from all z/OS LPARs and coupling facilities in a customer’s estate, this technique fills in some gaps quite nicely.

Well, it works for this customer, anyway… 🙂

Three Billboards?

You could consider presentations as advertising – either for what you’ve done or for what something can do. Generally my presentations are in the category of “Public Service Announcement”. Which rather combines the two:

  • What I’ve found I can do – that you might want to replicate.
  • What I’ve found a product can do – that you might want to take advantage of.

Sometimes it’s a “Public Health Warning” as in “you’d better be careful…”

Anyhow, enough of trying to justify the title of this post. 🙂

(I will say Three Billboards Outside Ebbing, Missouri is an excellent movie. I first saw it on an aeroplane on the usual tiny screen and it worked well even there.)

So, I have three presentations, two of which are brand new and the other significantly updated this year:

  • What’s The Use? With Scott Ballentine
  • zIIP Capacity And Performance
  • Two Useful Open Source Tools

I’ll post the abstracts below.

The first two are for IBM Technical University Virtual Edition (registration here). This is run by IBM and is premier technical training, including over 600 sessions on IBM IT Infrastructure.

As I’m a big fan of user groups I’m delighted to say the first and third are for GSE UK Virtual Conference 2021. In both cases these are “live” – which, frankly, I find more energising.

Scott and I recorded last week – as the Tech U sessions will be pre-recorded with live Question And Answer sessions at the end. So these are “in the can” – which is a big relief.

The third I’ve just started writing but I have a nice structure to it – so I’m sure it’ll get done, too.

So here are the abstracts…

What’s The Use – With Scott Ballentine

For many customers, collecting the SMF type 89 subtype 1 Usage Data records are an important and necessary part of Software Licencing, as they are used by SCRT to generate sub-capacity pricing reports.

But Usage Data has many more uses than that – whether in SMF 89 or SMF 30.

Customers can get lots of value out of this data – if they understand what the data means, and how it is produced.

Vendors can delight their customers – if they produce the right usage data, knowing how it can be used.

This presentation describes how Usage Data is produced, how a vendor can add value with it, and how customers can best take advantage of it.

zIIP Capacity And Performance

zIIP Capacity Planning tends to be neglected – in favour of General-Purpose Engines (GCPs). With Db2 allowing you to offload critical CPU to zIIPs, and the advent of zCX and z15 Recovery Boosts, it’s time to take zIIP capacity and performance seriously.

You will learn how to do zIIP capacity planning and performance tuning properly – with instrumentation and guidelines.

Two Useful Open Source Tools

You will learn how to use two open source tools many installations will find extremely useful, covering System and Db2 Performance:

Db2 DDF Analysis Tool – which uses Db2 SMF 101 Accounting Trace to enable you to manage DDF work and its impact on your system.

WLM Service Definition Formatter – which uses your WLM XML file to create some nice HTML, enabling you to see how the various parts of your Service Definition fit together.

Time For Action On Virtual Storage?

I wrote How I Look At Virtual Storage in 2014.

But since then my stance has shifted somewhat. A recent study I was part of made me realise my tone on z/OS virtual storage should probably be a little more strident. In that post I was somewhat even handed about the matter, just describing my method.

This study also involved the CICS Performance team. They pointed out that some of this customer’s CICS regions had substantial use of 24-Bit virtual storage. If those CICS regions were to be able to scale (individually) a limiting factor might be 24-Bit virtual storage.

Obviously working on CICS regions’ use of virtual storage is the correct way forward under these circumstances.

But it got me thinking, and doing a little investigating.

Why Virtual Storage Matters

Almost 35 years ago a sales rep in my branch suggested selling a customer more virtual storage. 🙂 (Yes, he really did.)

Joking apart, if you want more virtual storage you can’t buy it; You have to work for it.

But why would you want more 24-Bit virtual storage anyway?

  • Older applications might well be 24-Bit and converting them to 31- or even 64-Bit might be difficult or not economically viable.
  • Some things, such as access method buffers, might well be in 24-Bit virtual storage. (To be fair, high level languages’ support for buffers above the 16MB line is good.)

I raise the subject of access method buffers because one of the ways of tuning I/O performance is to increase the number of access method buffers. To optimally buffer a single QSAM data set, for example, takes a substantial proportion of an address space’s 24-Bit region – unless the buffers have moved above the 16MB line. So one is sparing of buffers under these circumstances, perhaps sacrificing some performance. (You would choose high I/O data sets to buffer, obviously.)

My own code – at least the bits I have the excuse of having inherited – is probably predominantly 24-Bit. I might fix that one day, though it doesn’t seem like a good use of time. (It would probably be coupled with removing SIIS (Store Into Instruction Stream) horrors.)

What’s New?

When my CICS colleagues mentioned managing the use of 24-Bit virtual storage within the CICS regions a thought occurred to me: I could examine whether the 24-Bit region size could be increased. Both are useful approaches – reducing the demand and increasing the supply.

The same is probably true of 31-Bit, of course. For 64-Bit I’m more concerned with getting MEMLIMIT set right.

I think I’ve observed something of a trend: It seems to me many customers have the opportunity to increase their 24-Bit region – probably by 1MB but sometimes by 2MB. This would be a handy 10 – 20% increase. I doubt many customers have examined this question in a while – though most have SMF 78-2 enabled all the time. It’s rare to get a set of customer data without it in – and we always pump out the report outlined in How I Look At Virtual Storage.

Examining And Adjusting Virtual Storage

In what follows I’m describing 24-Bit. Similar actions apply for 31-Bit.

  1. Check that your SQA definition is small enough that there is little or no SQA free. SQA can overflow into CSA but not vice versa – so overspecifying SQA is a waste of virtual storage.
  2. Check how much CSA is free.
  3. Adjust SQA and CSA so that there is generally 1MB CSA free but little if any SQA free, but so that you are raising the private boundary by an integer number of MB – probably 1MB, possibly 2MB.
  4. Monitor the resulting 24-bit virtual storage picture.

You need to do the analysis with data over a reasonable amount of time. I’d say at least a week. You want to know if SQA and CSA usage are volatile. In most customer situations I’ve seen they are relatively static, but you never know.

One of the key points is that the boundary between common and private is a segment boundary i.e. 1MB. There is therefore no point trying to claw back, say, half a megabyte. (The “generally keep 1MB free” above is not related to this; It’s just a reasonable buffer.)

For 24-Bit I said “adjust SQA and CSA so that there is generally 1MB free”. For 31-Bit I’ve generally said “keep 100MB free”. I think that’s fair but 200MB might be better. Most middleware and applications that use Below The Bar virtual storage have substantial amounts of 31-Bit. If their use is volatile – again when viewed over a reasonable period of time – then that might guide you towards 200MB free or more. The “100MB” is not structural; That’s 100 segments and is just a reasonable rule of thumb. At least for 31-Bit the 1MB segment boundary doesn’t seem nearly so coarse grained as it does for 24 Bit, so adjustment needn’t be in 100MB increments.

Talking of 31-Bit virtual storage, one of the biggest users of ECSA usage has been IMS. Until a few years ago the biggest users of 31-Bit private were Db2 and MQ. Db2 is now almost all 64-Bit and MQ has done lots of 64-Bit work.

Conclusion

I would urge customers to examine their virtual storage position. In particular, 24-Bit might well contain good news. 31-Bit might also, but there I’m more inclined to check there’s enough left. It’s not difficult to examine this, even if all you do is run off an RMF Virtual Storage Activity report.

I, for one, will make a point of examining the virtual storage reports we produce. Generally I do, anyway. Else I wouldn’t be seeing this trend.

Finally, a sobering thought: If, as has been said, customers are using 1 bit of addressability a year we are 21 years into a 33-year 64-Bit lifespan. I’m pre-announcing nothing by saying this. And the “1 bit a year” thing might not actually hold. But it does make you think, doesn’t it.