Another Slice Of PI

This post follows on from my 2019 post A Slice Of PI. Again, not Raspberry Pi but Performance Index PI. 🙂

I’ve become sensitised to how common a particular Workload Manager (WLM) problem is since I created some handy graphs.

Some Graphs Recent Customers Have Seen

When I examine a customer’s WLM Service Definition part of what I do is examining SYSTEM Workload, and then successive importances.

(SYSTEM Workload is mainly SYSTEM and SYSSTC Service Classes, of course. And if these are showing a drop off in velocity you can imagine what is happening to less important work. This is when I consider Delay For CPU samples – to figure out how much stress the system is under. But I digress.)

By “successive importances” I mean:

  1. Graph importance 1 Performance Index (PI) by Service Class by hour.
  2. Graph importance 2 Performance Index (PI) by Service Class by hour.

And so on.

Sometimes there is little work at eg Importance 1, so I might stop after Importance 2 or I might not. (Service definitions vary in what is at each importance level – and I graph CPU for each importance level to understand this.)

The above approach gives me some orientation. And it has shown up a phenomenon that is more common than I had supposed: Percentile goals where the service class period ends up with a PI of 0.5.

The Top And Bottom Buckets

Recall the following from A Slice Of PI: For a Percentile Goal when an individual transaction ends its response time relative to the goal is used to increment the transaction count in one of 14 buckets. There are two buckets of especial interest:

  • Bucket 1 – where transactions whose response times are no more than 50% of the goal time are counted.
  • Bucket 14 – where transactions whose response time are at least 400% of the goal time are counted.

These buckets feed into the Performance Index (PI) calculation. Let’s deal with Bucket 14 first:

Imagine a goal where the goal is “85% in 15ms”. If fewer than 85% of transactions end with response times short enough to land them in Buckets 1 to 13 the PI is 4. We don’t have much clue as to quite how long the more than 15% in Bucket 14 took, but we know they all took at least 4 × 15 = 60ms. So we don’t know how much to adjust the goal by. (Of course, we might want to tune the transactions or the environment instead – and we don’t know how much to speed things up by.)

A note on terminology: I’m going to use the term goal percent for the “85%” in this example. I’m going to use the term goal time for the “15ms” part.

The Bottom Bucket

Bucket 14 was the “warm up act” for Bucket 1.

With our “85% in 15ms” goal a transaction ending in Bucket 1 has a response time of no more than 0.5 × 15 = 7.5ms. Again, we don’t know how close to the bucket boundary the transaction response times tend to be. Here are two possibilities (and there are, or course, myriad others):

This first graph shows a transaction distribution where the transactions have response times tending towards way shorter than 50% of the goal time.

This second graph shows a transaction distribution where the transactions have response times tending towards just short of the 50% of goal time marks.

Both could have a PI of 0.5.

So what should we adjust the goal time to? I think we should be prepared to be iterative about this:

  • Set the goal time to somewhat less than the current goal time. Maybe as aggressively as 50% of the current goal time – as that is safe.
  • Remeasure and adjust the goal time – whether up or down.
  • Iterate

I think we have to get used to iteration: Sometimes, from the performance data, the specialist doesn’t know the final value but does know the direction of travel. This is one of those cases, as is Velocity.

In most cases we should be done in a very few iterations. So don’t be unimpressed by a specialist having the maturity to recommend iteration.

An alternative might be to tighten the goal percent, say to 95%. This has a couple of issues:

  1. it tells us nothing about distribution.
  2. It is more or less problematic, depending on the homogeneity of the work. At the extreme, if all transactions have the same measured response time adjusting the percent is a blunt instrument.

Problem 2 is, of course, more serious than Problem 1.

Of Really Short Response Times

Really short response times are interesting: Things being really fast is generally a good thing, but not always.

Consider the case of JDBC work where Autocommit is in effect. Here every single trivial SQL statement leads to a Db2 Commit and hence a transaction ending. This can lead to extremely short response times. More to the point, it’s probably very inefficient.

(You can observe this sort of thing with Db2 DDF Analysis Tool and Db2 Accounting Trace (SMF 101).)

But transactions have become faster. It’s not uncommon for goals such as our example (85% in 15 ms) to be lax.

Faster against the expectations of WLM goal setters. In the DDF case it’s quite likely such people won’t have been told what to expect – either in terms of performance or how to classify the work. Or, fundamentally, what the work is.

I see a lot of DDF Service Classes with 15ms for the goal time. For many years you couldn’t set it lower than that. Relatively recently, though, the lower limit was changed to 1ms. I’m not suggesting a transaction response time of 1ms is that common, but single digit is increasingly so.

So this 1ms limit stands to fix a lot of goal setting problems – both for percentile and average response time goal types.

The analogy that comes to mind is 64-Bit: We don’t – for now – need all 64 bits of addressability. But we certainly needed many more than 31. We might not need 1ms right now but we needed considerably less than 15ms. (I would guess some technological improvement meant this became feasible; I really should ask someone.)

Why Is A PI Of 0.5 A Problem?

You would think that doing better than goal would be a good thing, right?

Well, not necessarily:

  1. Not what was contracted for
  2. Not protective

Between the two of them there is a Service Delivery hazard: When things slow down an overly lax goal won’t stop this work from slowing down. And this is where Service Level Expectation – what we’ve delivered so far – clashes with Service Level Agreement – what the users are entitled to.

I often like, when we have a little time in a customer workshop, to ask where the goal come from. Perhaps from a contract, whether internal or external. Or maybe it’s the originally measured response time. Or maybe it’s just a “Finger in the air”. This all feeds into how appropriate the goal now is, and what to do about it.

Walking This Back A Tiny Bit

It’s not hopeless to figure out what’s going on if everything ends in Bucket 1, with a PI of 0.5. You can – again from Workload Activity Report (SMF 72-3) data – calculate the average response time. If the average is well below the “50% of goal time” mark that tells you something. Equally “not much below” tells you something else.

So, I think you should calculate the average, even though it’s not that useful for goal setting.


You probably know I’m going to say this next sentence before I say it: Revisit your WLM Service Definitions – both goal values and classification rules – on a regular basis.

And, now that I have PI graphs for each importance level, I’m seeing a lot of “PI – 0.5” situations. Hence my concentrating on the bottom bucket.

One final thing: With Db2 DDF Analysis Tool I am able to do my own bucketing of DDF response times. I can use whatever boundaries for buckets I want, and as many as I want. This helps me – when working with customers – to do better than the “14 buckets”. Or it would if I weren’t lazy and only had 10 buckets. 🙂 I can do the same with CPU time, and often do.

Hopefully this post has given you food for thought about your percentile goals.

One Service Definition To Rule Them All?

This is one of those posts where I have to be careful about what I say – to not throw a customer under the bus with over-specificity. So, bear with me on that score.

Actually it’s two customers, not one. But they have the same challenge: Managing Workload Manager (WLM) service definitions for multiple sysplexes.

These two customers have come to different conclusions, for now. I say “for now” because, in at least one case, the whole design is being reconsidered.

Some Basics

Even for those of us who know them well, the basics are worth restating:

  • A sysplex can have one and only one service definition.
  • A service definition cannot be shared between sysplexes, but it can be copied between them. These are separate installations and activations.
  • In a sysplex different systems might not run work in all the service classes.
  • The ISPF WLM application has been used since the beginning to edit service definitions.
  • You can print the service definition from the ISPF WLM application.
  • Relatively recently the z/OSMF WLM application became available – as a more modern (and more strategic) way of editing service definitions.
  • You can export and import the service definition as XML – in both the ISPF and z/OSMF applications.

What I’ve Been Doing Recently With WLM Service Definitions

For many years I relied entirely on RMF Workload Activity SMF records (SMF 72-3) when tuning WLM implementations. This simply wasn’t (good) enough, so a fair number of years ago I started taking WLM service definitions in XML form from customers when working with them.

Why wasn’t RMF SMF good enough? Let’s review what it buys you:

  • It tells you how service class periods behave, relative to their goals and with varying workload levels.
  • It allows you to examine report class data, which often is a good substitute for SMF 30 address space information.

So I’ve done a lot of good work with SMF 72-3, much of which has surfaced as blog posts and in conference presentations. So have many others.

But it doesn’t tell you what work is in each service class, nor how it got there. Likewise with report classes. Admittedly, SMF 30 will tell you an address space’s service and report classes, but it won’t tell you why. Likewise, SMF 101 for DDF work will give you a service class (field QWACWLME) but it won’t tell you why, and it won’t tell you anything about report classes. And SMF won’t tell you about history, always a fascinating topic.

To understand the structure of the service definition you need the XML (or the print, but I don’t like the print version). So that’s what I’ve been looking at, increasingly keenly.

A Tale Of Two Customers

  • Customer A has multiple sysplexes – on two machines. They sent me multiple XML service definitions – as they have one per sysplex.
  • Customer B has multiple sysplexes – on more than two machines. They sent me a single XML service definition – so they are propagating a single service definition to multiple sysplexes.

As it happens, in both customers, there are application similarities between some of their sysplexes. In Customer B there are different application styles between the LPARs in a single sysplex – and this fact will become important later in this discussion.

So, which is right? The rest of this post is dedicated to discussing the pros and cons of each approach.

Multiple Service Definitions Versus A Single Propagated One

The first thing to say is if you have an approach that works now think carefully before changing it. It’ll be a lot of work. “A lot of work” is code for “opportunity for error” rather than “I’m too lazy”.

If you have one service definition for each sysplex you might have to make the same change multiple times. But this would have to be an “egregious” change – where it’s applicable to all the sysplexes. An example of this would be where you’d misclassified IRLM address spaces and now needed to move them all into SYSSTC (where they belong). There’s nothing controversial or specific about this one, and not much “it depends” to be had.

For changes that are only relevant or appropriate to one sysplex this is fine. For example, changing the duration of DDFHI Period 1 for a Production sysplex.

But if you had a DDFHI in each of several sysplexes this could become confusing – if each ended up with a different specification. Confusing but manageable.

If you had one service definition for a group of sysplexes you’d make a change in one sysplex and somehow propagate that change around. You’d use literally the same XML file (assuming you go the XML route), export it from the first sysplex, and import it into the rest. The mechanism, though, is less important than the implications.

Any change has to be “one size fits all”. Fine for the IRLM example above, but maybe not so good for eg a CICS Region Service Class velocity change. In the “not so good” case you’d have to evaluate the change for all the sysplexes and only if it were good (enough) for all of them make the change.

But, you know, “one size fits all” is a problem even within a sysplex: Customer B has different applications – at quite a coarse-grained level – in different LPARs within the same sysplex:

  • The common componentry in the LPARs – e.g Db2 or MQ – probably warrants the same goals.
  • The application-specific, or transaction management, componentry – e.g. Broker, CICS, IMS, Batch – probably doesn’t.

There are some ways of being “differential” within a service definition. They include:

  • Treating one transaction manager differently from another.
  • Classification rules that are based on system.
  • Classification rules that are based on sysplex.

That latter – classification rules that are based on sysplex – is something I addressed very recently in my sd2html WLM Service Definition XML Formatter open source project. If you run it there’s a “Service Definition Statistics” table near the top of the HTML it produces. If there are classification rules based on sysplex name the specific sysplexes are listed towards the bottom of the table. This will save me time when looking at a customer’s service definition. You might use it , for example, to check whether such rules are still in place – when you thought they’d been removed. The specific use of the “Sysplex Name” qualifier will appear in either the Classification Groups or Classification Rules table, depending.

(I could use a good XML comparer, by the way. Because today I can’t easily tell the differences between Service Definition XML files – and I don’t think I should be teaching sd2html to do it for me.)


There is no conclusion, or at least no general firm conclusion; You really have to think it through, based on your own circumstances. Most notably two things:

  • Whether “one size fits all” works for you.
  • Whether operational procedures in WLM service definition management are effective.

But, still, it’s an interesting question. And it’s not hypothetical:

  • Plenty of customers have separate Sysprog, Development, Pre-Production, and Production sysplexes. In fact I’d encourage that.
  • Quite a few customers have separate Production sysplexes, especially outsourcers.

And this is why I find “Architecture” such fun. And why WLM is anything but simple – if you do it right.

Mainframe Performance Topics Podcast Episode 29 “Hello Trello”

This was a fun episode to make, not least because it featured as a guest Miroslava Barahona Rossi. She’s one of my mentees from Brasil and she’s very good.

We made it over a period of a few weeks, fitting around unusually busy “day job” schedules for Marna and I.

And this one is one of the longest we’ve done.

Anyhow, we hope you enjoy it. And that “commute length” can be meaningful to more of you very soon.

Episode 29 “Hello Trello” long show notes.

This episode is about how to be a better z/OS installation specialist, z/OS capture ratios, and a discussion on using Trello. We have a special guest joining us for the performance topic, Miroslava Barahona Rossi.

Follow up to Waze topic in Episode 8 in 2016

  • Apple Maps Adds Accident, Hazard, and Speed Check Reporting using the iPhone, CarPlay, and Siri to iOS.

What’s New

  • Check out the LinkedIn article on the IBM server changing for FTPS users for software electronic delivery on April 30, 2021, from using TLS 1.0 and 1.1 to using TLS 1.2, with a dependency on AT-TLS.

  • If you are using HTTPS, you are not affected! This is recommended.

Mainframe – Being a Better Installation Specialist

  • This section was modeled after Martin’s “How to be a better Performance Specialist” presentation. It’s a personal view of some ideas to apply to the discipline.

  • “Better” here means lessons learned, not competing with people. “Installation Specialist” might just as well have used “System Programmer” or another term.

  • There’s definitely a reason to be optimistic about being a person that does that Installation Specialist or System Programmer type of work:

  • There will always be a need for someone to know how systems are put together, how they are configured, how they are upgraded, and how they are serviced…how to just get things to run.

    • And other people that will want them to run faster and have all resources they need for whatever is thrown at them. Performance Specialists and Installation Specialists complement each other.
  • Consider the new function adoption needs too. There are so many “new” functions that are just waiting to be used.

    • We could put new functions into two catagories:

      1. Make life easiesr: Health Checker, z/OSMF, SMP/E for service retrieval

      2. Enable other kinds of workload: python, jupyter notebooks, docker containers

    • A good Installation Specialist would shine in identifying and enabling new function.

      • Now more than ever, with Continuous Delivery across the entire z/OS stack.

      • Beyond “change the process of upgrading”, into “driving new function usage”.

      • Although we know that some customers are just trying to stay current. Merely upgrading could bring new function activation out of the box, like SRB on z15.

    • A really good installation specialist would take information they have and use it in different ways.

      • Looking at the SMP/E CSIs with a simple program in C, for example, to find fixes installed between two dates.

      • Associating that with an incident that just happened, by narrowing it down to a specific element.

      • Using z/OSMF to do a cross-GLOBAL zone query capability, for seeing if a PE was present within the entire enterprise quickly.

      • Knowing what the right tool is for the right need.

    • Knowing parmilb really well. Removing unused members and statements. Getting away from hard-coding defaults – which can be hard, but sometimes can be easier (because some components tell you if you are using defaults).

      • Using the DISPLAY command to immediately find necessary information.

      • Knowing the z/OS UNIX health check that can compare active values with the hardened used parmlib member.

    • Researching End Of Service got immensely easier with z/OSMF.

    • Looking into the history of the systems, with evidence of merging two shops.

      • Could leave around two “sets” of parmlibs, proclibs. Which might be hard to track, depending on what you have such as ISPF statistics or comments. Change management systems can help.

      • Might see LPAR names and CICS region naming conventions

    • Might modern tools such as z/OSMF provide an opportunity to rework the way things are done?

      • Yes, but often more function might be needed to completely replace an existing tool…or not.
    • You can’t be better by only doing things yourself, no one can know everything. You’ve got to work with others who are specialists in their own area.

      • Performance folks often have to work with System Programmers, for instance. Storage, Security, Networking, Applications the list goes on.

      • Examples are with zBNA, memory and zIIP controls for zCX, and estate planning.

    • Use your co-workers to learn from. And teach then what you know too. And in forums like IBM-MAIN, conferences, user groups.

    • Last but not least, learn how to teach yourself. Know where to find answers (and that doesn’t mean asking people!). Learn how to try out something on a sandbox.

Performance – Capture Ratio

  • Our special guest is Miroslava Barahona Rossi, a technical expert who works with large Brasilian customers.

  • Capture Ratio – a ratio of workload CPU to system-level CPU as a percentage. Or, excluding operating system work from productive work.

    • RMF SMF 70-1 versus SMF 72-3
      • 70-1 is CPU at system and machine level
      • 72-3 is workload / service class / report class level
      • Not in an RMF report
  • Why isn’t the capture ratio 100%?

    • There wouldn’t be a fair way to attribute some kinds of CPU. For example I/O Interrupt handling.
  • Why do we care about capture ratio?

    • Commercial considerations when billing for uncaptured cycles. You might worry something is wrong if the capture ratio is low.

    • Might be an opportunity for tuning if below, say, 80%

  • What is a reasonable value?

    • Usually 80 – 90. Seeing more like 85 – 95 these days. It has been improved because more of I/O-related CPU is captured.

    • People worry about low capture ratios.

    • Also work is less I/O intensive, for example, because we buffer better

  • zIIP generally higher than GCP

  • Do we calculate blended GCP and zIIP? Yes, but also zIIP separately from GCP.

  • Why might a capture ratio be low?

    • Common: Low utilisation, Paging, High I/O rate.

    • Less common: Inefficient ACS routines, Fragmented storage pools, Account code verification, Affinity processing, Long internal queues, SLIP processing, GTF

  • Experiment correlating capture ratio with myriad things

    • One customer set of data with z13, where capture ratio varied significantly.

      • In spreadsheet calculated correlation between capture ratio and various other metrics. Used =CORREL(range, range) Excel function.

      • Good correlation is > 85%

      • Eliminate potential causes, one by one:

        • Paging, SIIS poor correlation

        • Low utilisation strong correlation

      • It has nothing much to do with machine generation. The same customer – from z9 to z13 – always had a low capture ratio.

        • It got a little bit better with newer z/OS releases

        • Workload mix? Batch versus transactional

      • All the other potential causes eliminated

      • Turned out to be LPAR complexity

        • 10 LPARs on 3 engines. Logical: Physical was 7:1, which was rather extreme.

        • Nothing much can be done about it – could merge LPARs. Architectural decision.

    • Lesson: worth doing it with your own data. Experiment with excluding data and various potential correlations.

  • Correlation is not causation. Look for the real mechanism, and eliminate causes one by one. Probably start with paging and low utilisation

  • Other kinds of Capture Ratio

    • Coupling Facility CPU: Always 100%, at low traffic CPU per request is inflated.

    • For CICS, SMF 30 versus SMF 110 Monitor Trace: Difference is management of the region on behalf of the transactions.

    • Think of a CICS region as running a small operating system. Not a scalable thing to record SMF 110 so generally this capture ratio is not tracked.

  • Summary

    • Don’t be upset if you get a capture ratio substantially lower than 100%. That’s normal.

    • Understand your normal. Be aware of the relationship of your normal to everybody else’s. But, be careful when making that comparison as it is very workload dependent.

    • Understand your data and causes. See if you can find a way of improving it. Keep track of the capture ratio over the months and years.

Topics – Hello Trello

  • Trello is based on the Kanban idea: boards, lists, and cards. Cards contain the data in paragraphs, checklist, pictures, etc.

  • Can move cards between lists, by dragging.

  • Templates are handy, and are used in creaing our podcast.

  • Power-ups add function. A popular one is Butler. Paying for them might be a key consideration.

  • Multiplatform plus web. Provided by Atlassian, which you might know from Confluence wiki software.

    • Atlassian also make Jira, whch is an Agile Project Management Tool.
  • Why are we talking about Trello?

    • We moved to it for high level podcast planning

      • One list per episode. List’s picture is what becomes the cover art.

      • Each card is a topic, except first one in the list is our checklist.

    • Template used for hatching new future episodes.

      • But, we still outlining the topic with iThoughts
    • We move cards around sometimes between episodes.

      • Recently more than ever, as we switched Episode 28 and 29, due to the z/OS V2.5 Preview Announce.

      • Right now planning two episodes at once.

  • Marna uses it to organise daily work, with personal workload management and calendaring. But it is not a meetings calendar.

    • Probably with Jira will be more useful than ever. We’ll see.
  • Martin uses it for team engagements, with four lists: Potential, Active, Dormant, Completed.

    • Engagement moves between lists as it progresses

    • Debate between one board per engagement and one for everything. Went with one board for everything because otherwise micromanagement sets in.

    • Github projects, which are somewhat dormant now, because of…

    • Interoperability

      • Github issues vs OmniFocus tasks vs Trello Cards. There is a Trello power up for Github, for issues, branches, commits, and pulls requests mapped into cards.

      • However, it is quite fragile , as we are not sure changes in state reliably reflected.

    • Three-legged stool is Martin’s problem, as he uses three tools to keep work in sync. Fragility in automation would be anybody’s problem.

      • iOS Shortcuts is a well built out model. For example, it can create Treello cards and retrieve lists and cards.

        • Might be a way to keep the 3 above in sync
    • IFTTT is used by Marna for automation, and Martin uses automation that sends a Notification when someone updates one of the team’s cards.

      • Martin uses Pushcut on iOS – as we mentioned in Episode 27 Topics

      • Trello provides a IFTTT Trigger, or Pushcut provides a Notification service, which can kick off a shortcut also.

      • Encountered some issues: each list needed its own IFTTT applet. Can’t duplicate applets in IFTTT so it’s a pain to track multiple Trello lists, even within a single board.

    • Automation might be a better alternative to Power-ups, as you can build them yourself.

  • Reflections:

    • Marna likes Trello. She uses it to be more productive, but would like a couple more function which might be addded as it becomes more popular.

    • Martin likes Trello too, but with reservations.

      • Dragging cards around seems a little silly. There are more compact and richer ways of representing such things.

      • A bit of a waste of screen real estate, as cards aren’t that compact. Especially as lists are all in one row. It would be nice to be able to fill more of the screen – with two rows or a more flexible layout.

In closing

  • GSE UK Virtual Conference will be 2 – 12 November 2021.

On the blog

So It Goes

Pi As A Protocol Converter

I wrote about Automation with my Raspberry 3B in Raspberry Pi As An Automation Platform. It’s become a permanent fixture in my office and I’ve given it another task. This blog post is about that task.

Lots of things use JSON(Javascript Object Notation) for communication via HTTP. Unfortunately they don’t all speak the same dialect. Actually:

  1. They do; It’s JSON pure and simple. Though some JSON processors are a little bit picky about things like quotes.
  2. It’s just as well there is the flexibility to express a diverse range of semantics.

This post is about an experiment to convert one form of JSON to another. When I say “experiment” it’s actually something I have in Production – as this post was born from solving a practical problem. I would view it as more a template to borrow from and massively tailor.

The Overall Problem

I have a number of GitHub repositories. With GitHub you can raise an Issue – to ask a question, log a bug, or suggest an enhancement. When that happens to one of my repositories I want to create a task in my task manager – OmniFocus. And I want to do it as automatically as possible.

There isn’t an API to do this directly, so I have to do it via a Shortcuts shortcut (sic) on iOS. To cause the shortcut to fire I use the most excellent PushCut app. PushCut can kick off a shortcut on receipt of a webhook (a custom URL) invocation.

Originally I used an interface between GitHub and IFTTT to cause IFTTT to invoke this webhook. This proved unreliable.

The overall problem, then, is to cause a new GitHub issue to invoke a PushCut webhook with the correct parameters.

The Technical Solution

I emphasised “with the correct parameters” because that’s where this gets interesting:

You can set GitHub up – on a repository-by-repository basis – to invoke a webhook when a new issue is raised. This webhook delivers a comprehensive JSON object.

PushCut webhooks expect JSON – but in a different format to what GitHub provides. And neither of these is tweakable enough to get the job done.

The solution is to create a “protocol converter”, which transforms the JSON from the GitHub format into the PushCut format. This I did with a Raspberry Pi. (I have several already so this was completely free for me to do.)

Implementation consisted of several steps:

  1. Install Apache web server and PHP on the Pi.
  2. Make that web server accessible from the Internet. (I’m not keen on this but I think it’s OK in this case – and it is necessary.)
  3. Write a script.
  4. Install it in the /var/www/html/ directory on the Pi.
  5. Set up the GitHub webhook to invoke the webhook at the address of the script on the Pi.

Only the PHP script is interesting. You can find how to do the rest on the web, so I won’t discuss them here.

PHP Sample Code

The following is just the PHP piece – with the eventual shortcut being a sideshow (so I haven’t included it).


$secret = "<mysecret>";
$json = file_get_contents('php://input');
$data = json_decode($json);

if($data->action == "opened"){
  $issue = $data->issue->number;
  $repository = $data->repository->name;
  $title = $data->issue->title;
  $url = $data->issue->html_url;

  $pushcutURL = "" . $secret . "/notifications/New%20GitHub%20Issue%20Via%20Raspberry%20Pi";

  //The JSON data.
  $JSON = array(
      'title' => 'New Issue for ' . $repository,
      'text' => "$issue $title",
      'input' => "$repository $issue $url $title",

  $context = stream_context_create(array(
    'http' => array(
      'method' => 'POST',
      'header' => "Content-Type: application/json\r\n",
      'content' => json_encode($JSON)

  $response = file_get_contents($pushcutURL, FALSE, $context);


But let me explain the more general pieces of the code.

  • Before you could even use it for connecting GitHub to PushCut you would need to replace <mysecret> with your own personal PushCut secret, of course.
  • $json = file_get_contents('php://input'); stores in a variable the JSON sent with the webhook. Let’s call this the “inbound JSON”.
  • The JSON gets decoded into a PHP data structure with $data = json_decode($json);.
  • The rest of the code only gets executed if $data->action is “opened” – as this code is only handling Open events for issues.
  • The line $pushcutURL = "" . $secret . "/notifications/New%20GitHub%20Issue%20Via%20Raspberry%20Pi"; is composing the URL for the PushCut webhook. In particular note the notification name “New GitHub Issue Via Raspberry Pi” is percent encoded.
  • The outbound JSON has to be created using elements of the inbound JSON, plus some things PushCut wants – such as a title to display in a notification. In particular the value “input” is set to contain the repository name, the issue number, the original Issue’s URL, and the issue’s title. All except the last are single-word entities. If you are adapting this idea you need to make up your own convention.
  • The $context = and $response = lines are where the PushCut webhook is actually invoked.

As I said, treat the above as a template, with the general idea being that the PHP code can translate the JSON it’s invoked with into a form another service can use, and then call that service.


It was very straightforward to write a JSON converter in PHP. You could do this for any JSON conversion – which is actually why I thought it worthwhile to write it up.

I would also note you could do exactly the same in other software stacks, in particular Node.js. I will leave that one as an exercise for the interested reader. I don’t know whether that would be faster or easier for most people.

On the question of “faster” my need was “low volume” so I didn’t much care about speed. It was plenty fast enough for my needs – being almost instant – and very reliable.

One other thought: My example is JSON but it needn’t be. There need not even be an inbound or outbound payload. The idea of using a web server on a Pi to do translation is what I wanted to get across – with a little, not terribly difficult, sample code.

Mainframe Performance Topics Podcast Episode 28 “The Preview That We Do Anew”

(Originally posted 2 March, 2021.)

It’s unusual for us to publish a podcast episode with a specific deadline in mind. But we thought the z/OS 2.5 Preview announcement was something we could contribute to. So, here we are.

I also wanted to talk about some Open Source projects I’ve been contributing to. So that’s in there.

And it was nice to have Nick on to talk about zCX.

Lengthwise, it’s a “bumper edition”… 🙂

Episode 28 “The Preview That We Do Anew” long show notes.

  • This episode is about several of the z/OS V2.5 new functions, which were recently announced, for both the Mainframe and Performance topics. Our Topics topic is on Martin’s Open Source tool filterCSV.

  • We have a guest for Performance: Nick Matsakis, z/OS Development, IBM Poughkeepsie.

  • Many of the enhancements you’ll see in the z/OS V2.5 Preview were provided on earlier z/OS releases via Continuous Delivery PTFs. The APARs are provided in the announce.

What’s New

Mainframe – Selected z/OS V2.5 enhancements

  • Many of the enhancements you’ll see in the z/OS V2.5 Preview were provided on earlier z/OS releases via Continuous Delivery PTFs. The APARs are provided in the announce.

  • We’ve divided up the Mainframe V2.5 items into two sections: installation and non-installation.

z/OS V2.5 Installation enhancements.

  • IBM will have z/OS installable with z/OSMF, in a portable software instance format!

  • z/OS V2.4 will not be installable with z/OSMF, and z/OS V2.4 driving system requirements remain the same.

  • z/OS V2.5 will be installable via z/OSMF, so that is a big driving system change.

    • However, there is a small window when z/OS V2.4 and z/OS V.5 are concurrently orderable in which z/OS V2.5 will have the same driving system requirements as z/OS V2.4. That overlapping window when z/OS V2.5 is planned to be available via both the old (ISPF CustomPac Dialog) and new (z/OSMF format) is September 2021 through January 2022.

    • After that window, be aware! When z/OS V2.5 is the only z/OS orderable release – at that time, all IBM ServerPac will have to be installed with z/OSMF.

    • All means CICS, Db2, IMS, MQ, and z/OS and all the program products.

    • To be prepared today for this change:

      • Get z/OSMF up and running on their driving system.

      • Learn z/OSMF Software Management (which is very intuitive and try to install a portable software instance from this website.

    • This is a big step forward in the z/OS installation strategy that IBM and all the leading software vendors have been working years on.

      • John Eells came to this very podcast in Episode 9 to talk about it.
    • CICS, Db2, and IMS are already installable with a z/OSMF ServerPac. You can try those out right now.

    • CBPDO will remain an option, instead of ServerPac. But it much harder to install.

      • ServerPac is much easier, and a z/OSMF ServerPac is easiest of all.

z/OS V2.5 Non-installation enhancements.

  • Notification of availability of TCP/IP extended services

    • For many operational tasks and applications that depend on z/OS TCP/IP communication services the current message is insufficient

    • New ENF event intended to enable applications with dependencies on TCP/IP extended services to initialise faster

  • Predictive Failure Analysis (PFA) has more checks

    • For above the bar private storage exhaustion, JES2 resource exhaustion, and performance degradation of key address spaces.
  • Workload Manager (WLM) batch initiator management takes into account availability of zIIP capacity

    • Works most effectively when customer has separate service classes for mostly-zIIP and mostly-GCP jobs

    • Catalog and IDCAMS enhancements

      • Catalog Address Space (CAS) restart functions are enhanced to allow you to change the Master Catalog without IPL

      • IDCAMS DELETE mask takes TEST and EXCLUDE. TEST to see what would be deleted using the mask. EXCLUDE is further filtering – beyond the mask.

      • IDCAMS REPRO moves I/O buffers above line. This will help avoid 878 “Insufficient Virtual Storage” ABENDs.
        We think think this might allow more buffers, and multitasking in one address space.

    • New RMF Concept for CF data gathering

      • There is a a new option, not the default, to optimize CF hardware data collection to one system. Remember SMF 74.4 has two types of data: system specific, and common to all systems.

      • This is designed to reduce overhead on n-1 systems.

  • RMF has been restructured, but all the functions are still intact. z/OS V2.5 RMF is still a priced feature.

    • A new z/OS V2.5 base element called “Data Gatherer” provides basic data gathering and is available to all, whether you’ve bought RMF or not. It will cut some SMF records.

    • There is a new z/OS V2.5 price feature called “Advanced Data Gatherer” which all RMF users are entitled to.

    • Marna is mentioning this, as the restructure has brought about some customization changes you’ll need to do one time for parmlib with APF and linklist.

  • More quite diverse RACF health checks for Pass tickets, subsystem address spaces active, and sysplex configuration.

Performance – z/OS V2.5 zCX enhancements.

  • Our special guest is Nick Matsakis, who is a performance specialist in z/OS Development, and has worked on several components in the BCP (GRS, XCF/XES, …). Martin and Nick have known each other for many years, recalling Nick’s assignment in Hursley, UK.

  • zCX is a base element new in z/OS V2.4, and requires a z14. It allows you to run Docker Container applications that run on Linux on Z on z/OS.

  • zCX is important for co-locating Linux on Z containers with z/OS. You can look at zCX like an appliance, which are z/OS address spaces.

  • Popular use cases can be found here and in the Redbook here. Another helpful source is Ready for the Cloud with IBM zCX.

    • Nick mentions the use cases of adding microservices to existing z/OS applications being served by a zCX container, and the MQ Concentrator for reducing z/OS CPU costs, by running it on zCX. Another is Aspera whcih good for streaming-type workloads.
  • zIIP eligibility enhancements

    • Context switching reduction was delivered to typically expect about 95% offload to zIIP.
  • Memory enhancements

    • Originially it was all 4K fixed pages. New enhancements include support for 1 MB and 2 GB large pages (still fixed) for backing guests.

      • Increases efficiency of memory management, with better performance is expected, mainly based on TLB miss reduction.

      • In house, Nick saw .25% up to about 6-12%, depending on what you are running.

    • Note need to set LFAREA as discussed in Episode 26.

      • LFAREA as of z/OS V2.3 is the maximum number of fixed 1M pages allowed on system. 2GB hasn’t changed.

      • zCX configuration allows you to say which page sizes you’d like to try. Plan for using 2GB.

    • Guest memory is planned to be configured up to 1 TB.

      • zCX uses fixed storage so the practical limit may be lower. The limit used to be much lower, at about 100 GB.

      • Now we support up to 1000 containers in a zCX address space. Capacity is increasing.

  • Another relief is in Disk space limits

    • The number of data and swap disks per appliance is planned to be increased to as many as 245. This is intended to enable a single zCX to address more data at one time.

    • Point is you can run more and larger containers.

  • Instrumentation enhanced

    • Monitor and log zCX resource usage of the root disk, guest memory, swap disk, and data disks in the servers job log.

    • zCX resource shortage z/OS alerts are proactive alerts that are sent to the z/OS system log (SYSLOG) or operations log (OPERLOG) to improve monitoring and automated operations. The server monitors used memory, root disk space, user data disk space, and swap space in the zCX instance periodically and issues messages to the zCX joblog and operator console when the usage rises to 50%, 70%, and 85% utilization. When returning below 50%, an information message is issued

    • But still nothing in SMF to look inside a zCX address space

      • There is Docker-specific instrumentation that can provide that for you.
  • SIMD (or Vector)

    • SIMD is a performance feature, and can be used for analytics.

    • Some containers don’t check if they are running on hardware where SIMD is available.

  • Note that most of what’s in the z/OS 2.5 Preview for zCX is rolled back to z/OS 2.4 with APARs.

  • From this, we can conclude zCX wasn’t a “one and done”.

    • z/OS 2.5 might be a good time to try it. There is a 90-day trial period, as there is a cost for it. But, why wait for 2.5?
  • Nick’s presentation (with Mike Fitzpatrick) can be downloaded here.

Topics – filterCSV and tree manipulation

  • Trees are nodes that have zero to many children. You can have a leaf node (zero children), or a non-leaf node (one or more children).

    • Navigation can be recursive or iterative, which makes it nice for programming.
  • Mindmapping leads to trees. Thinking of z/OS: Sysplex -> System -> Db2 -> Connected CICS leads to trees. Also, in Db2 DDF Analysis Tool we show DDF connections as a tree.

  • Structurally, each node is a data structure with fields such as readable names. Each node has pointers to its children and maybe its parent. This gives it its “topology”, and tree levels.

  • iThoughts is a mind mapping tool, and displays a mind map as a tree. Nodes can have colours and shapes, and many other attributes besides.

    • iThoughts runs on Windows, iOS, iPadOS and macOS.

    • Exports and imports CSV files, with a tree topology and also node attributes, such as shape, colour, text, notes.

    • Has very little automation of its own. But crucially you can mangle the CSV file outside of iThoughts, which is what filterCSV does.

  • filterCSV is a python open source program that manipulates iThoughts CSV files.

    • It could address the automation problem, as it mangling automatically.

    • An example: automatically colours the blobs based on patterns (regular expressions).

      • Colouring CICS regions according to naming conventions
  • fiterCSV started simple, and Martin has kept adding function. Most recently find and replace. As it’s an open source project, contributions are welcomed.

On the blog

So It Goes

SMF 70-1 – Where Some More Of The Wild Things Are

(First posted February 21, 2021)

As I recall, the last time I wrote about SMF 70-1 records in detail was Engineering – Part Two – Non-Integer Weights Are A Thing. Even if it weren’t, no matter – as I’d like you to take a look at it. The reason is to reacquaint you with ERBSCAN and ERBSHOW – two invaluable tools when understanding the detailed structure and contents of an SMF record. (Really, an RMF SMF record.) And it does introduce you to the concept of a Logical Processor Data Section.

This post is another dive into detailed record structure. (The first attempt at the last sentence had the word “derailed”; That might tell me something.) 🙂

In most cases a system cuts a single SMF 70 Subtype 1 record per interval. But this post is not about those cases.

The Structure Of A 70-1 Record

SMF 70-1 is one of the more complex record subtypes – and one of the most valuable.

Here is a synopsis of the layout:

What is in blue(ish) relates to the system cutting the record. The other colours are for other LPARs.

At its simplest, a single 70-1 record represents all the LPARs on the machine. But it’s not always that simple.

Let me point out some key features.

  • The CPU Data Sections are 1 per processor for the system that cut the record. In this example there are three – so this is a 3-way.
  • zIIPs and GCPs are treated the same, but they are individually identifiable as zIIP or GCP.
  • There is one Partition Data Section per logical partition on the machine, plus 1 called “*PHYSCAL”.
  • There is one Logical Processor Data Section per logical processor, plus 1 per physical processor.

The colour coding is useful here. Let’s divide it into two cases:

  • The processors for the cutting LPAR.
  • The processors for the other LPARs.

For what we’ll call “this LPAR”, there are CPU Data Sections for each processor, plus a Partition Data Section, Logical Core Data Sections, and Logical Processor Data Sections.

For each of what we’ll call “other LPARs” there are just the Partition Data Section and its Logical Processor Data Sections.

You’ll notice that the blue Partition Data Section and its Logical Processor Data Sections are the first in their respective categories. I’ve always seen it to be the case that this LPAR’s sections come first. I assume PR/SM returns them in that sequence – though I don’t know if this is an architectural requirement.

The relationship between Partition Data Sections and the corresponding Logical Processor Data Sections is straightforward: Each Partition Data Section points to the first Logical Processor Data Section for that LPAR and has a count of the number of such sections. The pointer here is an index into the set of Logical Processor Data Sections, where the first has an index of 0. (ERBSHOW calls it “#1”.)

(A deactivated LPAR has an (irrelevant) index and a count of 0 – and that’s how my code detects them.)

So far so good, and quite complex.

How Do You Get Multiple 70-1 Records In An Interval?

Obviously each system cuts at least one record per interval – if 70-1 is enabled. So this is not about that.

In recent years the number of physical processors in a machine and logical processors per LPAR have both increased. I regard these as technological trends, driven mainly by capacity. At the same time there is an architectural trend towards more LPARs per machine.

Here are the sizes of the relevant sections – as of z/OS 2.4:

  • CPU Data Section: 92 bytes.
  • Partition Data Section: 80 bytes.
  • Logical Processor Data Section: 88 bytes.
  • Logical Core Data Section: 16 bytes.

These might not seem like large numbers but you can probably see where this is heading.

An SMF record can be up to about 32KB in size. You can only fit a few hundred Logical Processor Data Sections into 32KB, and that number might be significantly truncated if this LPAR has a lot of processors.

All of this was easy with machines with few logical processors (and still is).

But let’s take the case of a 100-way LPAR (whatever we think of that.) Its own sections are (92 + 88 + 16) x 100 or 19.6KB plus some other sections. So at least 20KB. And that’s before we consider sections for other LPARs.

Now let’s ignore this LPAR and consider the case of 50 1-way LPARs. There the PR/SM related sections add up to (80 + 88) x 50 = 8.4KB. Of course it’s extremely unlikely many would be 1-way LPARs, so the numbers are realistically much higher than that.

By the way, for a logical processor to count in any of this it just has to be defined. It might well have zero Online Time. It might well be a Parked Vertical Low. It doesn’t matter. The Logical Processor Data Sections are still there.

So, to exceed the capacity of a 32KB 70-1 SMF record we just have to have a lot of logical processors across all the LPARs in the machine, whether in this system or other LPARs. And an exacerbating factor is if these logical processors are across lots of LPARs.

What Does RMF Do If The Data Won’t Fit One Record?

I’ve seen a lot of SMF 70-1 records in my time, and spent a lot of time with ERBSCAN and ERBSHOW examining them at the byte (and sometimes bit) level.

I do know RMF takes great care to adjust how it lays out the records.

Firstly, to state the obvious, RMF doesn’t throw away data; All the sections exist in some record in the sequence.

Secondly, RMF keeps each LPAR’s sections together. So the Partition Data Section and its related Logical Processor Data Sections are all in the same record. This is obviously the right thing to do, otherwise the index and count for the Logical Processor Data Sections could break.

Thirdly, and this is something I hadn’t figured out before, only one record in the sequence contains the CPU Data Sections. (I think also the Logical Core Data Sections.)

How Should I Handle The Multi-Record Case?

Let me assume you’re actually going to have to decide how to deal with this.

There are two basic strategies:

  1. Assemble the records in the sequence into one record in memory.
  2. Handle each record separately.

Our code, rightly in my opinion, uses Strategy 2. Strategy 1 has some issues:

  • Collecting information from multiple records, and timing the processing of the composite data.
  • Fixing up things like the index of the Logical Processor Data Sections.

Probably some tools do this, but it’s fiddly.

So we process each LPAR separately, thanks to all the information being in one record. And so we can process each record separately.

Reality Check

If you have only one 70-1 record per interval per cutting system none of the above is necessary to know. But I think it’s interesting.

If you rely on some tooling to process the records – and most sensible people do – you probably don’t care about their structure. Certainly, the RMF Postprocessor gets this right for you in the CPU Activity Report (and Partition Data Report sub-report).

So, I’ve probably lost most of my audience at this point. 🙂 If not, you’re on my wavelength – which isn’t crowded. 🙂 (This is the second “on my wavelength” joke in my arsenal, the other being open to misinterpretation.)

I like to get down into the physical records for a number of reasons, not least of which are:

  • When things break I need to fix them.
  • It cements my understanding of how what they describe works.

Oh, and it’s fun, too.

Final Thoughts

This post was inspired by a situation that required yet more adjusting of our code. Sometimes life’s that way. In particular, a number of LPARs were missing – because our Assembler code threw away any record with no CPU Data Sections. (This is inherited code but it’s quite possibly a problem I introduced some time in the past 20 years.)

I should point out that – for simplicity – I’ve ignored IFLs, (now very rare) zAAPs, and ICFs. They are treated exactly the same as GCPs and zIIPs. Of course the record-cutting LPAR won’t have IFLs or ICFs.

I have a quite old presentation “Much Ado About CPU”. Maybe I should write one with “Part Two” tacked on. Or maybe “Renewed” – if it’s not such a radical departure. But then I’ve done quite a bit of presentation writing on the general topic of CPU over recent years.

Raspberry Pi As An Automation Platform

(First posted 14 February, 2021)

Recently I bought a touch screen and attached it to one of my older Raspberry Pi’s (a 3B). In fact the Pi is attached to the back of the touch screen and has some very short cables. This is only a 7 inch diagonal screen but it’s more than enough for the experiment I’m going to describe.

Some of you will recognise I’ve used similar things – Stream Decks and Metagrid – in the past. Most recently I showed a Metagrid screenshot in Automating Microsoft Excel Some More.

So it will be no surprise that I’m experimenting with push-button automation again. But this time I’m experimenting with something that is a little more outside the Apple ecosystem. (Yes, Stream Deck can be used with Windows PCs but that isn’t how I use it.)

While both Stream Deck and Metagrid are commercially available push-button automation products, I wanted to see what I could do on my own. I got far enough that I think I have something worth sharing with other people who are into automation.

What I Built

The following isn’t going to be the prettiest user interface in the world but it certainly gets the job done:

Here are what the buttons in the code sample do (for me):

  • The top row of buttons allows me to turn the landing lights on and off.
  • The middle row does the same but for my home office.
  • The bottom row has two dissimilar functions:

This is quite a diverse set of functions and I want to show you how they were built.

By the way the screen grab was done with the PrtSc (“Print Screen’) key and transferred to my iPad using Secure Shellfish.

I used this article to figure out how to auto start the Python code when the Pi boots. It doesn’t get me into “kiosk mode” but then I didn’t really want it to.

Python Tkinter User Interface

What you see on the screen is a very simple Python program using the Tkinter graphical user interface library

The following is the code I wrote. If you just copy and paste it it won’t run. There are two modifications you’d need to make:

  • You need to supply your IFTTT maker key – enclosed in quotes.
  • You need to supply the URL to your Keyboard Maestro macro – for each macro.

If you don’t have IFTTT you could set IFTTTbuttonSpecs to an empty list. Similarly, if you don’t have any externally callable Keyboard Maestro macros (or externally callable URLs) you would want to make URLButtonSpecs an empty list.

You can, of course, rearrange buttons by changing their row and column numbers.

#!/usr/bin/env python3
import tkinter as tk
import tkinter.font as tkf
from tkinter import messagebox
import urllib.request
import urllib.parse
import os

class Application(tk.Frame):
    def __init__(self, master=None):
        tk.Frame.__init__(self, master)
        self.IFTTTkey = <Insert your IFTTT Key Here>

    def createWidgets(self):
        self.bigFont = tkf.Font(family="Helvetica", size=32)
        IFTTTbuttonSpecs = [
            ("Landing", True, "Landing\nLight On",0,0),
            ("Landing", False, "Landing\nLight Off",0,1),
            ("Office", True, "Office\nLight On",1,0),
            ("Office", False, "Office\nLight Off",1,1),

        URLButtonSpecs = [
            ("Say Hello\nKM", <Insert your Keyboard Maestro macro's URL here>,2,0)

        localCommandButtonSpecs = [
            ("Reboot\nPi","sudo reboot",2,1),

        buttons = []

        # IFTTT Buttons
        for (lightName, lightState, buttonLabel, buttonRow, buttonColumn) in IFTTTbuttonSpecs:
            # Create a button
            button = tk.Button(
                command=lambda lightName1=lightName, lightState1=lightState: self.light(
                    lightName1, lightState1

            button.grid(row=buttonRow, column=buttonColumn)


        for (buttonLabel, url, buttonRow, buttonColumn) in URLButtonSpecs:
            # Create a button
            button = tk.Button(
                command = lambda url1 = url : self.doURL(

            button.grid(row=buttonRow, column=buttonColumn)


        for (buttonLabel, cmd, buttonRow, buttonColumn) in localCommandButtonSpecs:
            # Create a button
            button = tk.Button(
                command = lambda cmd1 = cmd : self.doLocalCommand(

            button.grid(row=buttonRow, column=buttonColumn)


    def light(self, room, on):
        if on:
            url = (
                + urllib.parse.quote("Turn " + room + " Light On")
                + "/with/key/"
                + self.IFTTTkey
            url = (
                + urllib.parse.quote("Turn " + room + " Light Off")
                + "/with/key/"
                + self.IFTTTkey
        opening = urllib.request.urlopen(url)
        data =

    def doLocalCommand(self, cmd):

    def doURL(self, url):
        opening = urllib.request.urlopen(url)
        data =

app = Application()
app.master.title("Control Panel")

I’ve structured the above code to be extensible. You could easily change any of the three types of action, or indeed add your own.

Hue Light Bulbs And IFTTT

Philips Hue light bulbs are smart bulbs that you can turn on and off with automation. There are others, too, but these are the ones I happen to have in the house, along with a hub. I usually control them with Siri to one of the HomePods in the house or Alexa on various Amazon Echo / Show devices.

IFTTT is a web-based automation system. You create applets with two components:

  1. A trigger.
  2. An action.

When the trigger is fired the action happens. In my experiment a webhook URL can be set up to trigger the Hue Bulb action. For each of the four buttons I have an applet. Two bulbs x on and off.

I would observe a number of things I don’t much like, though none of them stopped me for long:

  • The latency is a few seconds – but then I usually don’t need a light to come on or go off quicker than that.
  • You can’t parameterise the applet to the extent I would like, more or less forcing me to create one applet per button.
  • You can’t clone an IFTTT applet. So you have to create them by hand.

Still, as I said, it works well enough for me. And I will be keeping these buttons.

Remotely Invoking Keyboard Maestro

This one is a little more sketchy, but only in terms of what I’ll do with it. You’ll notice I have “Hello World”. The sorts of things I might get it to do are:

  • Opening all the apps I need to write a blog post. Or to edit a certain presentation.
  • Rearranging the windows on my screen.

Keyboard Maestro is incredibly flexible in what it allows you to do.

To be able to call a macro you need to know two things:

  1. Its UUID.
  2. The bonjour name (or IP address) of the Mac running Keyboard Maestro.

You also need to have enabled the Web server in the Web Server tab of Keyboard Maestro’s Preferences dialog.

To construct the URL you need to assemble the pieces something like this:

http://<server name>:4490/action.html?macro=<macro UUID>

The UUID can be obtained while editing the macro using menu item “Copy UUID” under “Copy As” from the “Edit” menu.

It’s a little complicated but it runs quickly and can do a lot in terms of controlling a Mac.

Rebooting The Raspberry Pi

This one is the simplest of all – and the quickest. Python has the os.system() function. You pass a command string to it and it executes the command. In my case the command was sudo reboot.

It’s not surprising this is quick to kick off – as this is a local command invocation.

After I copied the Python code into this blog post I decided I want a companion button to shut down the Pi – for cleaning purposes. This would be trivial to add.


This is quite a good use of a semi-redundant Raspberry Pi – even if I spent more on the touch screen than I did on the Pi in the first place. And it was ever thus. 🙂

The diversity of the functions is deliberate. I’m sure many people can think of other types of things to kick off from a push button interface on a Raspberry Pi with a touch screen. Have at it! Specifically, feel free to take the Python code and improve on it – and tell me how I’m doing it all wrong. 🙂 Have fun!

I, for one, intend to keep experimenting with this. And somebody makes a 15” touch screen for the Pi… 🙂

Coupling Facility Structure Performance – A Multi-System View

It’s been quite a while since I last wrote about Coupling Facility performance. Indeed it’s a long time since I presented on it – so I might have to update my Parallel Sysplex Performance presentation soon.

(For reference, that last post on CF Performance was Maskerade in early 2018.)

In the past I’ve talked about how a single system’s service time to a single structure behaves with increasing load. This graphing has been pretty useful. Here’s an example.

This is from a system we’ll call SYS1. It is ICA-SR connected. This means a real cable, over less than 150m distance. It’s to a single structure in Coupling Facility CF – DFHXQLS_POOLM02, which is a list structure. Actually a CICS Temporary Storage sharing pool – “POOLM02”.

From this graph we can see that the service time for a request stays pretty constant at around 7.5μs. Also that the Coupling Facility CPU time per request is almost all of it.

I have another stock graph, actually a pair of them, which show a shift average view of all the systems’ performance with a single structure. This is pretty nice, too.

Here’s the Rate Graph across the entire sysplex.

Here we see SYS1 and it’s counterparts in the Sysplex – SYS2, SYS3, and SYS4.

(Note to self: They really are numbered that way.)

We can see that in general the traffic is mostly from SYS1 and SYS2, and almost none from SYS3. I would call that architecturally significant.

We can also see that there is no asynchronous traffic to this structure from any LPAR.

And here’s the Service Time graph.

You can see that the two IC-Peer-connected LPARs have better service times than the two ICA-SR-connected LPARs. This is reasonable given that IC Peer links are simulated by PR/SM and so unaffected by the speed of light or distance. Again, the statement has to be qualified by in general.

But the graphs you’ve seen so far leave a lot of questions unanswered.

So, for a long time I’ve wanted to do something that combined the two approaches: Performance With Increasing Load, and Differences Between Systems.

I wanted to get beyond the single-system view of scalability. I usually put a number of systems’ scalability graphs on a single slide but

  • The graphs end up smaller than I would like.
  • This doesn’t scale beyond four systems.

The static multi-system graphing is fine but it really doesn’t tell the full story.

Well, now I have it in my kitbag. I’m sharing a new approach with you – because I think you’ll find it interesting and useful.

The New Approach

How about plotting all the systems’ service times versus rates on one graph? It sounds obvious – now I mention it.

Well, let’s see how it works out. Here’s a nice example:

Again we have the same four systems and the same CF structure. Here’s what I conclude when I look at this:

  • SYS2 and SYS4 have consistently better service times – across the entire operating range – than SYS1 and SYS3. This shows the same IC Peer vs ICA-SR dynamic as we saw before.
  • SYS3 service times are worse than those of the other 3 – and again we see its rate top out considerably lower than those of the other 3.
  • SYS2 service times are always worse than SYS4’s. They happen to share the same machine and SYS2 is a much bigger LPAR than SYS4, actually spanning more than 1 drawer. That might have something to do with it.


Coupling Facility service times and traffic remain key aspects of tuning Parallel Sysplex implementations. The approach of “understand what happens with load” also remains valid.

The new piece – combining the service times for all LPARs sharing a structure none graph – looks like the best way of summarising such behaviours so far.

Of course this graph will evolve. I can already think of two things to do to it:

  • Add the link types into the series legend.
  • Avoid showing systems that don’t have any traffic to the structure (and maybe indicating that in the title).

But, for now, I want to get more experience with using this graph. For example, an even more recent customer has all systems connected to each coupling facility by ICA-SR links. The graphs for that one show similar curves for each system – which is unsurprising. But maybe in that case I would see a difference if the links were of different lengths.

And, as always, if I learn something interesting I’ll let you know.

More On Samples

This post follows on from A Note On Velocity from 2015. Follows on at a respectful distance, I’d say – since it’s been 5 years.

In that post I wrote “But those ideas are for another day or, more likely, another year (it being December now).” This is that other day / year – as this post reports on some of those “left on the table” aspects. For one, I do now project what happens if we include (or exclude) I/O samples.

In a recent customer engagement I did some work on WLM samples for a Batch service. This service class has 2 periods, the first period having an incredibly short 75 service units duration.

  • Period 1 is Importance 4, with a reasonable velocity.
  • Period 2 is Discretionary.

Almost everything ends in Period 2 – so almost all batch work in this shop is running Discretionary i.e. bottom dog without a goal.

As I said in A Note On Velocity, RMF reports attained velocity from Using and Delay samples and these come direct from WLM. Importantly this means you can calculate Velocity without having to sum all the buckets of Using and Delay samples. You won’t, for example, add in I/O Using and I/O Delay samples when you shouldn’t – if you’re calculating velocity from the raw RMF SMF fields (as our code does). I’ll call this calculation using the overall Using and Delay buckets the Headline Velocity Calculation.

I thought this would be useful for figuring out if I/O Priority Management is enabled. In fact there’s a flag for that – at the system level – but if you do the calculation by totting up the buckets you get sensible numbers for both cases: Enabled and Disabled.

I/O Priority Management can be enabled or disabled at the service class level. I don’t definitively see a flag in RMF for this at the service class level but presumably if the headline calculation doesn’t work versus totting up the individual buckets with I/O samples then the Service Class is not subject to I/O Priority Management. And the converse would be true.

Batch Samples

For Batch, the headline calculation is matched by totting up the buckets for Using and Delay, if you include QMPL in the Delay samples tally – because this represents Initiator Delay. This is sensible to include in the velocity calculation as WLM-managed initiators are, as the name suggests, managed according to goal attainment and a delay in being initiated really ought to be part of the calculation.

Equally, though, with JES-managed initiators you could get a delay waiting for an initiator. And WLM isn’t going to do anything about that.

(By the way, SMF 30 – at the address space / job level – has explicit times fields for a job getting started. The most relevant one is SMF30SQT.)

I was reminded in this study that samples where the work is eligible to run on a zIIP but where it actually runs on a GCP are included in Using GCP samples. If you do the maths it works. It’s not really surprising.

This is also a good time to remind you samples aren’t time, except for CPU – which is measured and converted to samples.

An example of where this is relevant is when zIIP speed is different from GCP speed. there are two cases for this:

  • WIth subcapacity GCPs – where the zIIPs are faster than GCPs.
  • With zIIPs running SMT-2 – where zIIP speed is slower than when SMT is not enabled. (It might still be faster than a GCP but it might not be.)

Here, it becomes interesting to think about how you get all the sample types approximately equivalent. I would expect – in the “zIIPs are different speed from GCPs” case there might need to be some use of the (R723NFFI) conversion factor. I wouldn’t, though expect the effective speed of SMT-2 zIIPs to be part of the conversion.

But perhaps I’m overthinking this and perhaps a raw zIIP second is treated the same as a raw GCP second. And both are, of course, different to Using I/O.

Sample Frequency And Sampleable Units

WLM samples Performance Blocks (PBs). These might be 1 per address space or there might be many. CICS regions would be an example of where there are many.

I’m told PBs in a CICS region are not the same as MXT (maximum number of tasks) but could approach it if the workload in the region built up enough. This is different from what I thought.

I tried to calculate MXT from sample counts divided by the sampling interval and didn’t get a sensible estimate. Which is why I asked a few friends. You can imagine that a method of calculating MXT not requiring CICS-specific instrumentation would’ve been valuable.


One thing I should note in this post is that – in my experience – sampling is exact. That is to say, if you add up the samples in the buckets right you get exactly the headline number. Exactness is valuable in that it gives you confidence in your inferences. Inexactness could still leave you wondering.

Most people don’t get into the raw SMF fields but if you do:

  • You can go beyond what eg RMF reports give you.
  • You get a much better feel for how the data (and the reality it describes) actually works.

But, as with the CICS MXT case, you can get unexpected results. I hope you (and I) learn from those.

Automating Microsoft Excel Some More

As I said in Automating Microsoft Excel, I thought I might write some more about automating Excel.

Recall I wrote about it because finding snippets of code to do what you want is difficult. So if I can add to that meagre stockpile on the web, I’m going to.

That other post was about automating graph manipulation. This post is about another aspect of automating Excel.

The Problem I Wanted To Solve

Recently I’ve had several instances where I’ve created a CSV (Comma-Separated Value) file I wanted to import into Excel. That bit’s easy. What made these instances different (and harder) was that I wanted to import them into a bunch of sheets. Think “15 sheets”.

This is a difficult problem because you have to:

  1. Figure out where the break points are. I’m thinking a row with only a single cell as a good start. (I can make my CSV file look like that.)
  2. Load each chunk into a separate new sheet.
  3. Name that sheet according the the value in that single cell.
  4. (Probably) delete any blank rows, or any that are just a cell with (underlining) “=” or “-” values.

I haven’t solved that problem. When I do I’ll be really happy. I expect to in 2021.

The Problem I Actually Solved

Suppose you have 15 sheets. There are two things I want to do, given that:

  • Rapidly move to the first or last sheet.
  • Move the current sheet left or right or to the start or end.

The first is about navigation when the the sheets are in good shape. The second is about getting them that way. (When I manually split a large CSV file the resulting sheets tend not to be in the sequence I want them in.)

As noted in the previous post I’m using the Metagrid app on a USB-attached iPad. Here is what my Metagrid page for Excel currently looks like:

In the blue box are the buttons that kick off the AppleScript scripts in this post. As an aside, note how much space there is around the buttons. One thing I like about Metagrid is you can spread out and not cram everything into a small number of spots.

The Scripts

I’m not going to claim my AppleScript is necessarily the best in the world – but it gets the job done. Unfortunately that’s what AppleScript is like – but if you are able to improve on these I’m all ears eyes.

Move To First Sheet

tell application "Microsoft Excel"
	select worksheet 1 of active workbook
end tell

Move To Last Sheet

tell application "Microsoft Excel"
	select worksheet (entry index of last sheet) of active workbook
end tell

Move Sheet To Start

tell application "Microsoft Excel"
	set mySheet to active sheet
	move mySheet to before sheet 1
end tell

Move Sheet To End

tell application "Microsoft Excel"
	set mySheet to active sheet
	set lastSheet to (entry index of last sheet)
	move mySheet to after sheet lastSheet
end tell

Move Sheet Left

tell application "Microsoft Excel"
	set mySheet to active sheet
	set previousSheet to (entry index of active sheet) - 1
	move mySheet to before sheet previousSheet
end tell

Move Sheet Right

tell application "Microsoft Excel"
	set mySheet to active sheet
	set nextSheet to (entry index of active sheet) + 1
	move mySheet to after sheet nextSheet
end tell


Those snippets of AppleScript look pretty simple. However, each took quite a while to get right. But now they save me time on a frequent basis. And they might save you time.

They are all Mac-based but the model is similar to that in VBA. If you’re a Windows person you can probably replicate them quite readily with VBA.

And perhaps I will get that “all singing, all dancing” Import-A-CSV-Into-Multiple-Sheets automation working. If I do you’ll hear read about it here.