Mainframe Performance Topics Podcast Episode 31 “Take It To The Macs”

This is the first blog post I’ve written on my new work MacBook Pro. While it’s been a lot of work moving over it’s a better place as it’s an Apple Silicon M1 Max machine with lots of memory and disk space.

That’s nice, but what’s the relevance to podcasting?

Well, it’s very warm here in the UK right now and I’ve been on video calls for hours on end. Yes, the machine gets warm – but possibly not from its load. But, importantly, there has been zero fan noise.

Fan noise has been the bedevilment of recording audio. Hopefully that era is now over – and just maybe the era of better sound quality in my recordings is upon us. (See also the not-so-secret Aftershow for this episode.)

As usual, Episode 31 was a lot of fun to make. I hope you enjoy it!

Episode 31 “Take it to the Macs” long show notes.

This episode is about our After Show. (What is that?)

Since our last episode, we were both in person at SHARE in Dallas, TX.

What’s New

  • More new news for CustomPac ServerPac removal date, which has been extended past January 2022. The CustomPac (ISPF) ServerPac removal date
    from Shopz for all ServerPacs will be July 10, 2022. Make sure you order before that date if you want a non-z/OSMF ServerPac. CBPDO is still available and unaffected

  • Data Set File System released OA62150 closed April 28th, 2022 only on z/OS V2.5,
    which we talked about in Episode 30.

  • IBM z16 – lots of great topics we are do on this in future episodes.

  • IBM z/OS Requirements have moved into the aha! tool, and they are called Ideas .

Mainframe – z/OS Management Services Catalogs: Importance of z/OSMF Workflows

  • z/OS Management Services Catalog, zMSC, allows you to customize a z/OSMF
    Workflow for your enterprise, and publish it in a catalog for others to “click and use”.

    • zMSC Services can be very useful, as you can encode specific installation’s standards into a Service.

    • As you can guess, there are different role for these zMCS Services: Administrators and Users.

      • Administrators are those can customize and publish a Service (from a z/OSMF Workflow definition file), and allow Users to run it.
    • To get you started, IBM provides 7 sample Services which are common tasks that you might want to review and publish. These samples are:

      1. Delete an alias from a catalog
      2. Create a zFS file system
      3. Expand a zFS file system
      4. Mount a zFS file system
      5. Unmount a zFS file system
      6. Replace an SMP/E RECEIVE ORDER certificate
      7. Delete a RACF user ID
    • More are likely to be added, based on feedback.

    • Note, however, someone could add their own from a z/OSMF Workflow. The z/OSMF Workflows could come from:

      • The popular Open Source zorow repository.

      • Created from your own ecosystem, perhaps even using the z/OSMF Workflow Editor to help you create it.

    • zMSC Services are based on z/OSMF Workflows. You can see why the discussion on knowing z/OSMF Workflows is important.

    • Customers can grab workflows and make them services, and provide more checking and control than just a z/OSMF Workflow can do. They can also be run again and
      again from published Services meaning that the tasks of Workflow creation, assignment, and acceptance are not necessary.

    • Without z/OSMF Workflows none of zMSC is usable, so get your Workflows ready to make appropriate ones into Services.

Performance – System Recovery Boost (SRB) Early Experiences

  • System Recovery Boost
    provides boosts of two kinds:

    • Speed Boost – which is useful for those with subcapacity servers to make them full speed. Won’t apply to full speed customers.

    • zIIP Boost – which allows work normally not allowed to run on a zIIP, to run on a zIIP.

      • You can purchase temporary zIIP capacity if you like.
  • There are basically three major stages to the SRB function:

    1. Those on the IBM z15, to reduce outage time:

      • Shutdown – which allows you to have 30 minutes worth of boosting during shutdown. This function must be requested to be used each time.

      • IPL – which allows you to have 60 minutes worth of boosting during IPL. This function, provided by default, is on.

    2. Additional functions for Recovery Process Boost, provided on IBM z15. Extends to structure or connectivity recovery, for instance.

    3. Newer additional functions for Recovery Process Boost, specifically on IBM z16, for stopping and starting certain middleware.

  • Martin has several early field experience, which he has summarised in four blog posts:

    1. Really Starting Something

    2. SRB And SMF

    3. Third Time’s The Charm For SRB – Or Is it?

    4. SRB And Shutdown Martin has noticed that Shutdown boosts might not be used as much.

  • It is important to know that SRB new function APARs have been released, and all have the SMP/E FIXCAT of IBM.Function.SystemRecoveryBoost.
    Some of these functions may or may not go back to the IBM z15.

  • Martin’s SMB conclusions are:

    • “Not one and done”. We’ve seen updates to this technology, which is a great thing to see expanding!

    • Good idea to run a small implementation project. Know what kind of advantage you are receiving from this function, which probably entails doing a “before” and “after” comparison.

    • Pay attention to your zIIP Pool Weights. An LPAR undergoing a boost might use a lot of zIIP; Make sure other LPARs have adaquate zIIP pool weights to protect them.

    • For Shutdown consider automation. This allows you to leave no SRB offering behind.

    • Take advantage of the available monitoring for effective usage.

  • Tell us of your SRB experience!

Topics – Stickiness

  • This topics explores what makes some technologies sticky, and some not, which Martin started in one of his blogs. Almost went with this as the podcast episode title.

  • Martin and Marna discuss some of the attributes that are important for continuing to be used, and what makes a function fall away over time.

    • Value – Needs to a balance between making your life better, valid, and (somewhat) financial. Important points are productivity , reliability, value in doing something it is hard to do. Familiarity is nice value.

    • Completeness – What features are there and missing. Example of this is Shortcuts, which has added a lot of functions over time. It can be a journey, and have lots of competitors.

    • Usability and immediacy – An unsuccessful attempt was Martin’s numeric keypad without the ability to know what the keys were for with some fumbling. Streamdeck was programmatic and helped by showing what the keys were for.

    • Reliability – How infrequently must it fail for it to be acceptable? 1%? 10%? It depends.

    • Setup complexity – Most people want them simple to set up. Martin likes to tailor capability. Marna likes it to be easy.

Out and about

  • Marna and Martin are both planning on being in SHARE, Columbus, August 22–26, 2022.

  • Martin will be talking about zIIP Capacity & Performance, with a revised presentation. Marna has a lot of sessions and labs, as usual – including the new z/OS on IBM z16!

On the blog

So It Goes

WLM-Managed Initiators And zIIP

One item in the z/OS 2.5 announcement caught my eye. Now 2.5 is becoming more prevalent it’s worth talking about it. It is zIIP and WLM-Managed Initiators.

WLM-Managed Initiators

The purpose of WLM-Managed Initiators is to balance system conditions against batch job initiation needs:

  • Start too many initiators and you can cause CPU thrashing.
  • Start too few and jobs will wait for an excessive period to get an initiator.

And this can be used to both delay job initiation as well as choosing where to start an initiator.

Prior to z/OS 2.5 General-Purpose Processor (GCP) capacity would be taken into account but zIIP capacity wouldn’t be. With z/OS 2.5 zIIP is also taken into account.

What WLM Knows About

But this raises a question: How does WLM know how zIIP intensive a job will be – before it’s even started?

Well, WLM isn’t clairvoyant. It doesn’t know the proclivities of an individual job before it starts. In fact it doesn’t know anything about individual job names. It can’t say, for instance, “job FRED always burns a lot of GCP CPU”.

So let’s review what WLM actually does know:

  • It knows initiation delays – at the Service Class level. This shows up as MPL Delay.1
  • It knows the usual components of velocity – again, at the Service Class level. (For example GCP Delay and GCP Using.)
  • It knows system conditions. And now zIIP can be taken into account.
  • It knows – at the Service Class level – resource consumption by a typical job. And this now extends to zIIP.

How Prevalent Is zIIP In Batch?

zIIP is becoming increasingly prevalent in the Batch Window, often in quite an intense manner. Examples of drivers include:

  • Java Batch
  • Db2 Utilities
  • A competitive sort product

When we2 look at customer systems we often see times of the night where zIIP usage is very high. (Often we’re not even focusing on Batch but see it out of the corner of our eye.)

(Actually this usage tends to be quite spiky. For example, Utilities windows tend to be of short duration but very intensive.)

So, it’s worth looking at the zIIP pool for the batch window to understand this.

(I’ll say, in passing, often coupling facility structures are intensively accessed in similar, sometimes contemporaneous, bursts. As well as GCP and memory.)

I’m labouring the point because this trend of zIIP intensiveness in parts of the batch window might be a bit of a surprise.

Conclusion

If we accept WLM will now manage initiators’ placement (in system terms) and starting (in timing terms) with regard to zIIP we probably should classify jobs to service classes accordingly.

It’s suggested zIIP jobs should be in different service classes to non-zIIP ones. With the possible exception of Utilities jobs I don’t think this is realistic. (Is java batch businesswise different from the rest?) But if you can achieve it without much distortion of your batch architecture WLM will take into account zIIP better in z/OS 2.5. One reason why you might not be able to do this is if the zIIP-using jobs are multi-step and only some of the steps are zIIP-intensive.


  1. Not to be confused with MPL Delay for WLM-Managed Db2 Stored Procedures Server address spaces, which is generally more serious. Metapoint: It pays to know what a service class is for. 

  2. Teamly “we”, not “Royal We”. :-) 

SRB And Shutdown

I’ve written several times about System Recovery Boost (SRB) so I’ll try to make this one a quick one.

For reference, previous posts were:

From that last one’s title it clearly wasn’t (the end of the matter). It’s worth reading the table with timestamps again.

Notice in particular the first interval – the last one before shutdown – is not boosted. In fact I note that fact in the post.

Some months on I think I now understand why – and I think it’s quite general.

To enable SRB at all you have to enable it in PARMLIB. But it is the default – so you’d have to actively disable it if the z15 (or now z16) support is installed. (One customer has told me they’ve actually done that.)

But enablement isn’t the same as actually invoking it:

  • For IPL you don’t have to do anything. You get 60 minutes’ boost automatically.
  • For shutdown you have to explicitly start the boost period – using the IEASDBS procedure.

What I think is happening is installations have SRB enabled but don’t invoke IEASDBS to initiate shutdown.

I would evolve shutdown operating procedures to include running IEASDBS. In general, I think SRB (and RPB, for that matter) would benefit from careful planning. So consider running a mini project when installing z15 or z16. If you’re already on z15 note there are enhancements in this area for z16. I also like that SRB / RPB is continuing to evolve. It’s not a “one and done”.

By the way there’s a nice Redpiece on SRB: Introducing IBM Z System Recovery Boost. It’s well worth a read.

In parting, I should confess I haven’t established how CPU intensive shutdown and IPL are, more how parallel. Perhaps that’s something I should investigate in the future. If I draw worthwhile conclusions I might well write about them here.

Engineering – Part Six – Defined Capacity Capping Considered Harmful?

For quite a while now I’ve been able to do useful CPU analysis down at the individual logical processor level. In fact this post follows on from Engineering – Part Five – z14 IOPs – at a discreet distance.

I can’t believe I haven’t written about Defined Capacity Capping before – but apparently I haven’t.

As you probably know such capping generally works by introducing a “phantom weight”. This holds the capped LPAR down – by restricting it to below its normal share (of the GCP pool). Speaking of GCPs, this is a purely GCP mechanism and so I’ll keep it simple(r) by only discussing GCPs.

But have you ever wondered how RMF (or PR/SM for that matter) accounts for this phantom weight?

Well, I have and I recently got some insight by looking at engine-level GCP data. Processing at the interval and engine level yields some interesting insights.

But let me first review the data I’m using. There are three SMF record types I have to hand:

  • 70-1 (RMF CPU Activity)
  • 99-14 (Processor Topology)
  • 113 (HIS Counters)

I am working with a customer with 8 Production mainframes (a mixture of z14 and z15 multi-drawer models). Most of them have at least one z/OS LPAR that hits a Defined Capacity cap – generally early mornings across the week’s data they’ve sent.

None of these machines is terribly busy. And none of them are even close to having all physical cores characterised.

Vertical Weights

In most cases the LPARs only have Vertical High (VH) logical GCPs. I can calculate what the weight is for a VH as it’s a whole physical GCP’s worth of weight: Divide the total pool weight by the total number of physical processors in the pool. For example, if the LPARs’ weights for the pool add up to 1000 and there are 5 physical GCPs in the pool a physical GCP’s worth of weight is 200 – and so that’s the polar weight of a VH logical GCP. (And is directly observable as such.)

Now here’s how the logical processors are behaving:

  • When not capped all the logical processors have a full processor’s weight (as expected).
  • When capped weights move somewhat from higher-numbered logical GCPs to lower-numbered ones.

The consequence is some of the higher numbered ones become Vertical Lows (VLs) and occasionally a VH turns into a Vertical Medium (VM). What I’ve also observed is the remaining VH’s get polar weights above a full engine’s weight – which they obviously can’t entirely use.

And we know all this from SMF 70 Subtype 1 records, summarised in each RMF interval at the logical processor level.

Logical Core Home Addresses

But what are the implications of Defined Capacity capping?

Obviously the LPAR’s access to GCP CPU is restricted – which is the intent. And, almost as obviously, some workloads are likely to be hit. You probably don’t need a lecture from me on the especial importance of having WLM set up right so the important work is protected under such circumstances. Actually, this post isn’t about that.

There are other consequences of being capped in this way. And this is really what this post is about.

When a logical processor changes polarisation PR/SM often reworks what are deemed “Home Addresses” for the logical processors:

  • For VH logical processors the logical processor is always dispatched on the same physical processor – which is its home address.
  • A VM logical processor isn’t entitled to a whole physical processor’s worth of weight. It has, potentially, to share with other logical processors. But it still has a home address. It’s just that there’s a looser correspondence between home address and where the VM is dispatched in the machine.
  • A VL logical processor has an even looser correspondence between its home address and where it is dispatched. (Indeed it has no entitlement to be dispatched at all.)

What I’ve observed – using SMF 99 Subtype 14 records – follows. But first I would encourage you to collect 99-14 as they are inexpensive. Also SMF 113, but we’ll come to that.

When SMF 70-1 says the LPAR is capped (and the weights shift, as previously described) the following happens: Some higher-numbered logical GCPs move home addresses – according to SMF 99-14. But, in my case, these are VL’s. So their home addresses are less meaningful.

In one case, and I don’t have an explanation for this, hitting the cap caused the whole LPAR to move drawers. And it moved back again when the cap was removed.

If the concept of a home address is less meaningful for a VL, why do we care that it’s moved? Actually, we don’t. We care about something else…

… From SMF 113 it’s observed that Cycles Per Instruction (CPI) deteriorates. Usually one measures this across all logical processors, or all logical processors in a pool. In the cases I’m describing these measures deteriorate. But there is some fine structure to this. In fact it’s not that fine…

… The logical processors that turned from VH to VL experience CPI values that move from the reasonable 3 or so to several hundred. This suggests to me these VL logical processors are being dispatched remote from where the data is. You could read that as “remote from where the rest of the LPAR is”. There might also be a second effect of being dispatched to cores with effectively empty local caches (Levels 1 and 2). Note: Cache contents won’t move with the logical processor as it gets redispatched somewhere else.

So the CPI deterioration factor is real and can be significant when the LPAR is capped.

Conclusion

There are two main conclusions:

  • Defined Capacity actual capping can have consequences – in terms of Cycles Per Instruction (CPI).
  • There is value in using SMF 70-1, 99-14, and 113 to understand what happens when an LPAR is Defined Capacity capped. And especially analysing the data at the individual logical processor level.

By the way, I haven’t mentioned Group Capping. I would expect it to be similar – as the mechanism is.

Mainframe Performance Topics Podcast Episode 30 “Choices Choices”

It’s been a mighty long time since Marna and I got a podcast episode out – and actually we started planning this episode long ago. It’s finding good times to record that does it to us, as planning can be a bit more asynchronous.

Hopefully this delay has enabled some of you to catch up with the series. Generally the topics stand the test of time, not being awfully time critical.

And I was very pleased to be able to record a topic with Scott Ballentine. It’s about a presentation we wrote – which we synopsise. I think this synopsis could prove inspiring to some people – whether they be customers or product developers.

With luck this might well be the last episode we record where I have to worry about fan noise. Being able to dispense with noise reduction might help my voice along a little, too. 🙂

The episode can be found here. The whole series can, of course, be found here or from wherever you get your podcasts.

Episode 30 “Choices Choices” Show notes

This episode is about our Topics Topics on choosing the right programming language for the job. We have a special guest joining us for the performance topic, Scott Ballentine.

Since our last episode, we were virtually at GSE UK, and IBM TechU. Martin also visited some Swedish customers.

What’s New

  • New news for CustomPac removal date, which has been extended past January 2022. The reason was to accommodate the desired Data Set Merge capability in z/OSMF which customers needed. z/OSMF Software Management will delivery the support in PH42048. Once that support is there, then ServerPac can exploit it. For the new withdrawl date, it is planned to be announced in 2Q2022.

  • Check out the LinkedIn article on the IBM server changing for FTPS users for software electronic delivery on April 30, 2021, from using TLS 1.0 and 1.1 to using TLS 1.2, with a dependency on AT-TLS.

    • If you are using HTTPS, you are not affected, and is recommended.

Mainframe – Only in V2.5

  • Let’s note: z/OS 2.3 EOS September 2022, z/OS 2.4 not orderable since end of January 2022

  • This topic was looking at some new functions that are only in z/OS V2.5. We wouldn’t necessarily expect anything beyond this point to be rolled back into V2.4.

    • Data Set File System, planned to be available in 1Q 2022.

      • Allows access to MVS sequentional or partitioned data sets from z/OS UNIX, that have certain formats.

      • Must be cataloged. Data set names are case insensitive.

      • Popular use cases would be looking at the system log after it has been saved in SDSF, editing with vi, and data set transfers with sftp.

      • Also will be useful with Ansible and DevOps tooling.

      • Serialization and security is just as if it was being accessed via ISPF.

      • There are mapping rules that you’ll need to understand. The path will begin with /dsfs.

    • Dynamic Change Master Catalog, yes, without an IPL

      • Must have a valid new master catalog to switch to

      • More so, you can put a comment on the command now

      • Helpful if you wanted to remove imbed or replicate and you haven’t been able to because it would have meant an outage.

    • RACF data base encryption has a statement of direction.

    • For scalability:

      • Increase z/OS Memory limit above 4TB to 16TB, with only 2GB frames above 4TB real. Good examples to exploit this is Java and zCX.

      • More Concurrently ”Open” VSAM Linear Datasets. Db2 exploits with Apar PH09189, and APAR PH33238 is suggested.

        • Each data set is represented by several internal z/OS data areas which reside in below the bar storage.

        • This support moves both VSAM and allocation data areas above the bar to reduce the storage usage in the below the bar storage area.

        • The support is optional, control is with ALLOCxx’s SYSTEM SWBSTORAGE with SWA will cause SWBs to be placed in 31-bit storage, as they have been in prior releases. ATB will cause SWBs to be eligible to be placed in 64-bit storage.

        • Can be changed dynamically and which option you are using can be displayed.

    • Noteable user requirements included:

      • ISPF Updates to ISPF in support of PDSE V2 member generations, and SUBMIT command to add an optional parameter SUBSYS.

        • Useful for directing jobs to the JES2 emergency subsystem
      • Access Method Services – IDCAMS – DELETE MASK has two new options TEST and EXCLUDE

        • TEST will return all the objects that would have been deleted if TEST wasn’t specified

        • EXCLUDE will allow a subset of objects that match the MASK to be excluded from those being deleted

        • Also, REPRO is enhanced to move its I/O buffers above the line to reduce the instances of out of space (878) ABENDs

    • z/OS Encryption Readiness Technology zERT

      • z/OS v2.5 adds support for detecting and responding to weak or questionable connections.

      • Policy based enforcement during TCP/IP connection establishment

        • Extending the Communications Server Policy Agent with new rules and actions

        • Detect weak application encryption and take action

        • Notification through messages and take action with your automation

        • Auditing via SMF records

        • Immediate Termination of connections is available through policy

  • There’s a lot of other stuff rolled back to V2.4

Performance – What’s the Use? – with special guest Scott Ballentine

  • This discussion is a summary from a joint presentation on Usage Data and IFAUSAGE

  • Useful for developers and for customers

  • The topic is motivational because customers can get a lot of value of out this usage data, and understand the provenance of IFAUSAGE data.

  • A macro vendors or anybody use can use to:

    • Show which products are used and how, including some numbers

    • Show names: Product Vendor, Name, ID, Version, Qualifier

    • Show numbers: Product TCB, Product SRB, FUNCTIONDATA

    • And let’s see how they turn into things you can use

  • The data is ostensibly for SCRT

    • Which is fed by SMF 70 and SMF 89

    • You might want to glean other value from IFAUSAGE

  • Scott talked about encoding via IFAUSAGE, appears in SMF 30 and 89-1

    • SMF 89-1: Software Levels query, Db2 / MQ subsystems query

    • SMF 30: Topology (e.g. CICS connecting to Db2 or MQ), Some numbers (Connections’ CPU)

    • Both SMF 30 and 89

      *FUNCTIONDATA: You could count transactions, but unsure of any IBM products using it. Db2 NO89 vs MQ Always On.

      • Slice the data a little differently with 30 vs 89
  • Some of these examples might inspire developers to think about how they code IFAUSAGE

    • Are your software levels coded right?
    • Do you use Product Qualifier creatively?
    • Do you fill in any of the numbers?
  • Have given the presentation four times

    • Technical University, October 2021

    • GSE UK Virtual Conference, November 2021

    • Nordics mainframe technical day

    • And our own internal teams which was meant to be a dry run, but actually was after the other three

  • It’s a living presentation which could given it at other venues, including to development teams.

    • Living also means it continues to evolve.
  • Hope is developers will delight customers by using IFAUSAGE right, and customers will take advantage in the way shown with reporting examples.

Topics – Choices, Choices

  • This topics is about how to choose a language to use for which purpose. Different languages were discussed for different needs.

  • Use Case: Serving HTML?

    • PHP / Apache on Mac localhost. Problem is to serve dynamically constructed HTML, which is used for Martin’s analysis.

    • PHP processes XML and can do file transfer and handle URL query strings. PHP 8 caused some rework. Fixes to sd2html for this.

    • Javascript / Node.js on Raspberry Pi. Good because plenty of ecosystem. Node also seems a moving target.

  • Host consideration: Running on e.g. laptop?

    • Python: Built-ins, for example CSV, Beautiful Soup, XML. However, Python 3 incompatible with Python 2, with Python 3.8 has nice “Walrus Operator”. Tabs and spaces can be irratating.

    • Automation tools: Keyboard Maestro, Shortcuts. On iPhone / iPad as well now as Mac OS.

    • Javascript in Drafts and OmniFocus. Cross platform programming models

    • AppleScript

  • Host consideration: Running on z/OS?

    • Assembler / DFSORT for high-volume data processing. Mapping macros shipped with many products.

    • REXX for everything else.

      • Martin uses it for orchestrating GDDM and SLR, to name two. As of z/OS 2.1 can process SMF nicely.

      • Health checks. Marna finds REXX easy to use for Health Checks, with lots of good samples.

      • z/OSMF Workflows. Easy to run REXX from a Workflow.

  • Overall lesson: Choose the language that respects the problem at hand.

    • Orchestrates what you need to orchestrate, runs in the environment you need it to run in, has lots of samples, has sufficient linguistic expression, is sufficiently stable, and performs well enough.

    • In summary, just because you have a hammer not everything is a nail. Not every oyster contains a PERL.

Out and about

  • Both Martin and Marna will be at SHARE, 27 – 30 March, in Dallas.

    • Marna has 6 sessions, and highlights the BYOD z/OSMF labs.

    • Martin isn’t speaking, but will be busy.

On the blog

So it goes.

Third Time’s The Charm For SRB – Or Is it?

Passing reference to Blondes Have More Fun – Or Do They?.

Yeah, I know, it’s a tortuous link. 🙂 And, nah, I never did own that album. 🙂

I first wrote about System Recovery Boost (SRB) and Recovery Process Boost (RPB) in SRB And SMF. Let me quote one passage from it:

It should also be noted that when a boost period starts the current RMF interval stops and a new one is started. Likewise when it ends that interval stops and a new one is started. So you will get “short interval” SMF records around the boost period.

I thought in this post I’d illustrate that. So I ran a query to show what happens around an IPL boost. I think it’s informative.

RMF Interval Start Interval Minutes Interval Seconds IPL Time (UTC) IPL Time (Local) zIIP Boost Note
May 9 07:58:51 5:02 302 May 2 07:13:08 May 2 08:13:08 No 1 2 3
Down Time May 9 07:58:51 – 08:15:01
May 9 08:19:50 9:03 543 May 9 07:15:01 May 9 08:15:01 Yes 4
May 9 08:28:54 15:00 900 May 9 07:15:01 May 9 08:15:01 Yes 5
May 9 08:43:55 15:00 900 May 9 07:15:01 May 9 08:15:01 Yes 5
May 9 08:58:55 15:00 900 May 9 07:15:01 May 9 08:15:01 Yes 5
May 9 09:13:55 1:31 91 May 9 07:15:01 May 9 08:15:01 Yes 6
May 9 09:15:25 13:29 809 May 9 07:15:01 May 9 08:15:01 No 7
May 9 09:28:55 15:00 900 May 9 07:15:01 May 9 08:15:01 No 8

Some notes:

  1. UTC Offset is + 1 Hour i.e. British Summer Time.
  2. IPL time was a week before.
  3. For some reason there was no shutdown boost.
  4. This is a short interval and the first with the boost. And note RMF starts a few seconds after IPL.
  5. These are full-length intervals (900 seconds or 15 minutes) with the boost.
  6. This is a short final interval with the boost.
  7. This is a short interval without the boost – which I take to be a form of re-sync’ing.
  8. This is a return to full-length intervals.

So you can see the down time between RMF cutting its final record and the IPL. Also between the IPL and RMF starting. You can also see the short intervals around starting and stopping the boost period.

Here’s an experimental way of showing the short intervals and the regular (15 minute) intervals.

The blue intervals are within the boost period, the orange outside it.

I don’t know if the above is helpful, but I thought it worth a try.

I don’t know that this query forms the basis for my Production code, but it just might. And I remain convinced that zIIP boosts (and, to a lesser extent, speed boosts) are going to be a fact of life we are going to have to get used to.

Finally, I’ll also admit I’m still learning about how RMF intervals work – so this has been a useful exercise for me.

Of course, when I say “finally”, I only mean “finally for this post”. I’ve a sneaking suspicion I’ve more to learn. Ain’t that always the way? 🙂

Stickiness

Question: What’s brown and sticky?

Answer: A stick. 🙂

It’s not that kind of stickiness I’m talking about.

I’ve experimented with lots of technologies over the years – hardware, software, and services. Some of them have stuck and many of them haven’t.

I think it’s worth exploring what makes some technologies stick(y) and some not – based on personal experience, largely centered around personal automation.

So let’s look at some key elements, with examples where possible.

Value

The technology has to provide sufficient value at a sufficiently low cost. “Value” here doesn’t necessarily mean money; It has to make a big enough contribution to my life.

To be honest, value could include hobbying as opposed to utility. For example, Raspberry Pi gives me endless hours of fun.

But value, generally for me, is productivity, reliability, enhancement, and automation in general:

  • Productivity: Get more done.
  • Reliability: Do it with fewer errors than I would.
  • Enhancement: Do things I couldn’t do.
  • Automation: Take me out of the loop of doing the thing.

Completeness

If a technology is obviously missing key things I’ll be less likely to adopt it.

But there is value – to go with the irritation – of adopting something early. You have to look at the prospects for building out or refinement.

An example of this is Siri Shortcuts (neé Workflow). It started out with much less function than it has now. But the rate of enhancement in the early days was breathtaking; I just knew they’d get there.

And the value in early adoption includes having a chance to understand the later, more complex, version. I learn incrementally. A good example of this might be the real and virtual storage aspects of z/OS.

Also, the sooner I adopt the earlier I get up the learning curve and get value.

I’m beta’ ing a few of my favourite apps and I’d be a hopeless beta tester for new function if I hadn’t got extensive experience of the app already.

Usability And Immediacy

A first attempt at push-button automation was using an external numeric keypad to automate editing podcast audio with Audacity.

The trouble with this is that you have to remember which button on the keypad does what. I fashioned a keyboard template but it wasn’t very good. (How do you handle the keys in the middle of the block?)

When I heard about StreamDeck I was attracted to the fact each key had an image and text on it. That gives immediate information about what the key does. I didn’t rework my Audacity automation to use it – as I coincidentally moved to Ferrite on iPad for my audio editing needs. But I built lots of new stuff using it.

So StreamDeck has usability a numeric keypad doesn’t. It’s also better than hot key combinations – which I do also rely on.

Reliability

What percent of the time does something have to fail for you to consider it unreliable? 1%? 10%?

I guess it depends on the irritation or damage factor:

  • If your car fails to start 1% of the time that’s really bad.
  • If “Ahoy telephone, change my watch face” fails 10% of the time that’s irritating but not much more.

The latter case is true of certain kinds of automation. But others are rock solid.

And, to my mind, Shortcuts is not reliable enough yet – particularly if the user base includes devices that aren’t right up to date. Time will tell.

Setup Complexity

I don’t know whether I like more setup complexity or less. 🙂 Most people, though, would prefer less. But I like tailorability and extensibility.

A good balance, though, is easy to get going but a high degree of extensibility or tailorability.

Conclusion

I’m probably more likely to try new technologies than most – in some domains. But in others I’m probably less likely to. Specifically, those domains I’m less interested in anyway.

The above headings summarise the essentials of stickiness – so I won’t repeat them here.

I will say the really sticky things for me are:

  • Drafts – where much of my text really does start (including this blog post).
  • OmniFocus – my task manager, without which a lot of stuff wouldn’t get done.
  • StreamDeck for kicking stuff off.
  • Keyboard Maestro for Mac automation.
  • Apple Watch
    • for health, audio playback, text input (yes really), and automation (a little).
  • Overcast – as my podcast player of choice.
  • iThoughts – for drawing tree diagrams (and, I suppose, mind mapping) 🙂

You might notice I haven’t put Shortcuts on the list. It almost makes it – but I find its usability questionable – and now there are so many alternatives.

There is an element of “triumph of hope over experience” about all this – but there is quite a lot of stickiness: Many things – as the above list shows – actually stick.

It’s perhaps cruel to note two services that have come unstuck – and I can say why in a way that is relevant to this post:

  • Remember The Milk was my first task manager but it didn’t really evolve much – and it needed to to retain my loyalty.
  • Evernote was my first note taking app. They got a bit distracted – though some of their experiments were worthwhile. And again evolution wasn’t their forte.

I suppose these two illustrate another point: Nothing lasts forever; It’s possible my Early 2023 stickiness list will differ from my Early 2022 one.

One final thought: The attitude of a developer / supplier is tremendously important. It’s no surprise several of the sticky things have acquired stickiness with a very innovative and responsive attitude. I just hope I can display some of that in what I do.

Really Starting Something

This post is about gleaning start and stop information from SMF – which, to some extent, is not a conventional purpose.

But why do we care about when IPLs happen? Likewise middleware starts and stops? Or any other starts and stops?

I think, if you’ll pardon the pun, we should stop and think about this.

Reasons Why We Care

There are a number of reasons why we might care. Ones that come immediately to mind are:

  • Early Life behaviours
  • System Recovery Boost and Recovery Process Boost
  • PR/SM changes such as HiperDispatch Effects
  • Architectural context

There will, of course, be reasons I haven’t thought of. But these are enough for now.

So let’s examine each of these a little.

Early Life Behaviours

Take the example of a Db2 subsystem starting up.

At very least its buffer pools are unpopulated and there are no threads to reuse. Over time the buffer pools will populate and settle down. Likewise the thread population will mature. When I’ve plotted storage usage by a “Db2 Engine” service class I’ve observed it growing, with the growth tapering off and the overall usage settling down. This usually takes days, and sometimes weeks.

(Parenthetically, how do you tell the difference between a memory leak and an address space’s maturation? It helps to know if the address space should be mature.)

Suppose we didn’t know we were in the “settling down” phase of a Db2 subsystem’s life. Examining the performance data, such as the buffer pool effectiveness, we might draw the wrong conclusions.

Conversely, take the example of a z/OS system that has been up for months. There is a thing called “a therapeutic IPL”. Though z/OS is very good at staying up and performing well for a very long time, an occasional IPL might be helpful.

I’d like to know if an IPL was “fresh” or if the z/OS LPAR had been up for months. This is probably less critical than the “early life of a Db2” case, though.

System Recovery Boost and Recovery Process Boost

With System Recovery Boost and Recovery Process Boost resource availability and consumption can change dramatically – at least for a short period of time.

In SRB And SMF I talked about early experience and sources of data for SRB. As I said I probably would, I’ve learnt a little more since then.

One thing I’ve observed is that if another z/OS system in the sysplex IPLs it can cause the other systems in the sysplex to experience a boost. I’ve seen time correlation of this effect. I can “hand wave” it as something like a recovery process when a z/OS system leaves a sysplex. Or perhaps as a Db2 Datasharing member disconnects from its structures.

Quite apart from catering for boosts, detecting and explaining them seems to me to be important. If you can detect systems IPLing that helps with the explanation.

PR/SM Changes

Suppose an LPAR is deactivated. It might only be a test LPAR. In fact that’s one of the most likely cases. It can affect the way PR/SM behaves with HiperDispatch. Actually that was true before HiperDispatch. But let me take an example:

  • The pool has 10 CPs.
  • LPAR A has weight 100 – 1 CP’s worth.
  • LPAR B has weight 200 – 2 CP’s worth.
  • LPAR C has weight 700. – 7 CP’s worth.

All 3 LPARs are activated and each CP’s worth of weight is 100 (1000 ÷ 10)

Now suppose LPAR B is deactivated. The total pool’s weight is now 800. Each CP’s weight is now 80 (800 ÷ 10). So LPAR A’s weight is 1.25 CP’s worth and LPAR C’s is 8.75 CP’s worth.

Clearly HiperDispatch will assign Vertical High (VH), Vertical Medium (VM), and Vertical Low (VL) logical processors differently. In fact probably to the benefit of LPARs A and C – as maybe some VL’s become VM’s and maybe some VM’s become VH’s.

The point is PR/SM behaviour will change. So activation and deactivation of LPARs is worth detecting – if you want to understand CPU and PR/SM behaviour.

(Memory, on the other hand, doesn’t behave this way: Deactivate an LPAR and the memory isn’t reassigned to the remaining LPARs.)

Architectural Context

For a long time now – if a customer sends us SMF 30 records – we can see when CICS or IMS regions start and stop.

Architecturally (or maybe operationally) it matters whether a CICS region stops nightly, weekly, or only at IPL time. Most customers have a preference (many a strong preference) for not bringing CICS regions down each night. However, quite a few still have to. For some it’s allowing the Batch to run, for a few it’s so the CICS regions can pick up new versions of files.

Less importantly, but equally architecturally interesting, is the idea that middleware components that start and stop together are probably related. Whether they are clones, part of the same technical mesh, or business wise similar.

How To Detect Starts And Stops

In the above examples, some cases are LPAR (or z/OS system) level. Others are at the address space or subsystem level.

So let’s see how we can detect these starts and stops at each level.

System-Level

At the system level the best source of information is RMF SMF Type 70 Subtype 1.

For some time now 70-1 has given the IPL date and time for the record-cutting system (field SMF70_IPL_TIME, which is in UTC time). As I pointed out in SRB And SMF, you can see if this IPL (and the preceding shutdown) was boosted by SRB.

LPAR Activation and Deactivation can also, usually, be detected in 70-1. 70-1’s Logical Processor Data Section tells you, among other things, how many logical processors this LPAR has. If it transits from zero to more than zero it’s been activated. Similarly, if it transits from more than zero to zero it’s been deactivated. The word “usually” relates to the fact that the LPAR could be deactivated and then re-activated in an RMF interval. If that happens my code, at least, won’t notice the bouncing. This isn’t, of course the same as an IPL – where the LPAR would remain activated throughout.

The above reinforces my view you really want RMF SMF from all the z/OS systems in your estate, even the tiny ones. As that way you’ll see the SMF70_IPL_TIME values for them all.

Subsystem-Level

When I say “Subsystem Level” I’m really talking about address spaces. For that I would look to SMF 30.

But before I deal with subsystems I should note an alternative way of detecting IPLs: Reader Start Time in SMF 30 for the Master Scheduler Address Space is within seconds of an IPL. Close enough, I think. This is actually the method I used in code written before the 70-1 field became available.

For an address space, generally you can use its Reader Start Time for it coming up. (Being ready for work, though, could be a little later. This is also true, of course, for IPLs. And SMF won’t tell you when that is. Likewise for shutting down.) You could also use the Step- and Job-End timestamps in SMF 30 Subtypes 4 and 5 for when the address space comes down. In practice I use Interval records and ask of the data “is the address space still up?” until I get the final interval record for the address space instance.

When it comes to reporting on address space up and down times I group them by ones with the same start and stop times. That way I see the correlated or cloned address spaces. This is true for both similar address spaces (eg CICS regions) and dissimilar (such as adding Db2 subsystems into the mix).

Conclusion

As I hope I’ve shown you, there are lots of architectural and performance reasons why beginnings and endings are important to detect. I would say it’s not just about observation; It could be a basis for improvement.

As I also hope I’ve demonstrated, SMF documents such starts and stops very nicely – if you interrogate the data right. And a lot of my coding effort recently had been in spotting such changes and reporting them. If I work with you(r data) expect me to be discussing this. For all the above reasons.

SRB And SMF

I’ve just had my first brush with SMF from z15’s System Recovery Boost (SRB).

(Don’t blame me for the reuse of “SRB”.) 🙂

The point of this post is to share what I’ve discerned when processing this SMF data.

System Recovery Boost

To keep the explanation of what it is short, I’ll say there are two components of this:

  • Speed Boost – which enables general-purpose processors on sub-capacitymachine models to run at full-capacity speed on LPARs being boosted
  • zIIP Boost – which enables general-purpose workloads to run on zIIP processors that are available to the boosted LPARs

And there are several different triggers for the boost period where these are available. They include:

  • Shutdown
  • IPL
  • Recovery Processes:
    • Sysplex partitioning
    • CF structure recovery
    • CF datasharing member recovery
    • HyperSwap

The above are termed Boost Classes.

If you want more details a good place to start is IBM System Recovery Boost Frequently Asked Questions

I’ve bolded the terms this post is going to use.

The above-mentioned boosts stand to speed up the boosted events – so they are good for both performance and availability.

RMF SMF Instrumentation For SRB

If you wanted to detect when a SRB boost had happened and the nature of it you would turn to SMF 70 Subtype 1 (CPU Activity).

There are two fields of interest here:

  • In the Product Section field SMF70FLA has extra bits giving you information about this system’s boosts.
  • In the Partition Data Section field SMF70_BoostInfo has bits giving you a little information about other LPARs on the machine’s boosts.

It should also be noted that when a boost period starts the current RMF interval stops and a new one is started. Likewise when it ends that interval stops and a new one is started. So you will get “short interval” SMF records around the boost period. (In this set of data there was another short interval before RMF resynchronised to 15 minute intervals.) Short intervals shouldn’t affect calculations – so long as you are taking into account the measured interval length.

After a false start – in which I decoded the right byte in the wrong section 🙂 – I gleaned the following information from SMF70FLA:

  • Both Speed Boost and zIIP Boost occurred.
  • The Boost Class was “Recovery Process” (Class 011 binary).
    • There is no further information in SMF 70-1 record as to which recovery process happened.

From SMF70_BoostInfo I get the following information:

  • Both Speed Boost and zIIP Boost occurred – for this LPAR.
  • No other LPAR on the machine received a boost. Not just in this record but in any of the situation’s SMF data.

The boost period was about 2 minutes long – judging by the interval length.

Further Investigations

I felt further investigation was necessary as the type of recovery process wasn’t yielded by SMF 70-1.

I checked SMF 30 Interval records for the timeframe. I drew a blank here because:

  • No step started in the appropriate timeframe.
  • None of the steps running in this timeframe looked like a cause for a boost.

I’m not surprised as the types of boost represented by Recovery Process really should show up strongly in SMF 30.

One other piece of evidence came to light: Another LPAR in the Sysplex (but not on the same machine) was deactivated. It seems reasonable to me that one or other of the Recovery Boost activities would take place in that event.

Conclusion

While I do think z15 SRB is a very nice set of performance enhancements, I do think you’re going to need to cater for it in your SMF processing for a number of reasons:

  • It’s going to affect intervals and their durations.
  • It’s going to cause things like speed changes on subcapacity GCPs and also zIIP behaviours.
  • A boosted LPAR might compete strongly with other (possibly Production) LPARs.
  • It’s going to happen, in all probability, every time you IPL an LPAR.

That last says it’s not “exotica”. SRB is the default behaviour – at least for IPL and Shutdown boost classes.

As I’ve indicated, SMF 70-1 tells most of the story but it’s not all of it.

There is one piece of advice I can give on that: Particularly for Recovery Process Boost, check the system log for messages. There are some you’re going to have to automate for anyway.

One final point: A while back I enhanced my “zIIP Performance And Capacity” presentation to cover SRB. I’ll probably refine the SRB piece as I gain more experience. Actually, even this blog post could turn into a slide or two.

Clippy? Not THAT Clippy!

A recent episode of the Mac Power Users Podcast was a round up of clipboard managers – on iOS and Mac OS. You can find it here.

There was a subsequent newsgroup discussion here which I’ve been participating in.

There are lots of things I want from a clipboard manager. They include:

  1. It to keep a history.
  2. It to enable the text (and, I suppose, images) to be readily processed.
  3. It to sync between devices.

Needs 1 and 3 are readily met by a lot of clipboard managers – as the podcast episode illustrated. Item 2, though, has always intrigued me.

This post deals with that topic – with an experiment using two clipboard managers:

I write in Markdown – a lot. It’s a nice plain text format with lots of flexibility and a simple syntax. So it’s natural my two examples are Markdown-centric.

They’re not identical cases – with the Paste / Shortcuts example being a simpler example.

In both cases, though, the key feature is the use of the clipboard history to “fill in the blanks”. How they approach this is is interesting – and both take a fairly similar approach.

Paste and Shortcuts – A Simple Case

Paste is an independently-developed app that runs on both Mac OS and iPad OS / iOS, with the copied items being sync’ed between all your devices.

It maintains a clipboard history – which this experiment will use.

Shortcuts started out on iOS as Workflow. Apple bought the company and made it a key component of automation on iOS, iPad OS, and (with Monterey) Mac OS. In principle it’s simple to use.

So here’s a simple example. It takes the last two clipboard entries and makes a Markdown link out of them. The link consists of two elements:

  • In square brackets the text that appears.
  • In round brackets the URL for the link.

The following shortcut has 4 steps / actions:

  1. Retrieve the last clipboard item (Position 1).
  2. Retrieve the previous clipboard item (Position 2).
  3. Create text from these two – using a template that incorporates the square and round brackets.
  4. Copy this text to the clipboard.

You compose the shortcut by dragging and dropping the actions.

Here is the shortcut.

There’s a problem in the above in seeing which clipboard item is in which position in the template. On Mac OS clicking on the ambiguous item leads to a dialog:

Click on Reveal and the source of the variable is revealed – with a blue box round it:

Obviously, copying to the clipboard in the right order is important and the above shows how Shortcuts isn’t that helpful here. I suppose one could detect a URL and swap the two clipboard items round as necessary. But that’s perhaps a refinement too far.

I actually developed this shortcut on Mac OS – but I might have been better off doing it on iPad OS. I don’t find the Mac OS Shortcuts app a nice place to develop shortcuts. (Sometimes – but not this time – it’s advisable to develop on the platform the actions are specific to.)

Keyboard Maestro – A Slightly More Complex Case

Keyboard Maestro isn’t really a clipboard manager – but it has lots of clipboard manipulation features. It only runs on Mac OS and is a very powerful automation tool. which makes it ideal for the purposes of this discussion.

In a similar way to Shortcuts, you compose what is called a macro out of multiple actions – using drag and drop.

Here’s the macro:

The first action fills out the template with the last two clipboard items – copying the result back to the clipboard. It’s more direct than the Shortcuts example. Plus, it’s clearer which clipboard item is plugged in where. (The tokens %PastClipboard%0% and %PastClipboard%1% are the latest clipboard item and the one before it – and I like the clarity of the naming.)

The second action activates the Drafts application – which the Shortcuts example didn’t do.

Then it pastes the templated text into the current draft. Again the Shortcuts example didn’t do that.

This, for me, is a more useful version of the “compose a Markdown link” automation. The only downside for me is that it doesn’t run on iPad OS or iOS. But then I do most of my writing on a Mac anyway.

Conclusion

It’s possible, with these two apps at least, to get way beyond just accessing the last thing you copied to the clipboard. (The built-in clipboard capabilities of the operating system won’t get you previous clipboard items.)

Both Shortcuts and Keyboard Maestro are automation tools – so experimenting with them yields a useful conclusion: There can be more value with using the clipboard when you use it with automation.

It should be noted that you needn’t use the clipboard at all if you automate the acquisition of the data. This is true of both Shortcuts and Keyboard Maestro. Both are perfectly capable of populating variables and using them.

However, when it comes to selecting text and using it in a template, user interaction can be handy. And sometimes that needs a clipboard – as the user gathers data from various sources.

This pair of experiments illustrates that approach.

One final note: I haven’t shared editable shortcuts or Keyboard Maestro macros as

  1. These are so very simple you could more readily create them yourself.
  2. You’re going to want to edit them beyond recognition – unless they exactly fit your need.

The point is to encourage you to experiment.