SAP And Db2 Correlation IDs

Every so often I get to work with a SAP customer. I’m pleased to say I’ve worked with four in recent years. I work with them sufficiently infrequently that my DDF SMF 101 Analysis code has evolved somewhat in the meantime.

The customer situation I’m working with now is a good case in point. And so I want to share a few things from it. There is no “throwing anyone under the bus” but I think what I’ve learnt is interesting. I’m sure it’s not everything there is to learn about SAP, so I won’t pretend it is.

The Structure Of SAP Db2 Correlation IDs

In Db2 a correlation ID (or corrid) is a 12-character name. Decoding it takes some care. For example:

  • For a batch job up to the first 8 characters are the job name.
  • For a CICS transaction characters 5 to 8 are the transaction ID.

In this set of data the correlation ID is interesting and useful:

  • The first three characters are the Db2 Datasharing group name (or SAP application name).
  • The next three are “DIA” or “BTC” – usually. Occasionally we get something else in these 3 positions.
  • Characters 7 to 9 are a number – but encoded in EBCDIC so you can read them.

I wouldn’t say that all SAP implementations are like this, but there will be something similar about them – and that’s good enough. We can do useful work with this.

Exploring – Using Correlation IDs

Gaul might indeed be divided into three parts. (“Gallia est omnis divisa in partes tres”). So let’s take the three parts of the SAP Correlation ID:

Db2 Datasharing Group Name / Application Name

To be honest, this one isn’t exciting – unless the Datasharing Group Name is different from the SAP Application Name. This is because:

  • Each SAP application has one and only one (or zero) Datasharing Groups.
  • Accounting Trace already contains the Datasharing Group Name.

In my DDF SMF 101 Analysis code I’m largely ignoring this part of the Correlation ID, therefore.

BTC Versus DIA

The vast majority of the records have “BTC” or “DIA” in them, and this post will ignore the very few others. Consider the words “have “BTC” or “DIA” in them”. I chose my words carefully: these strings might not be at offsets 3 to 5. Here’s a technique that makes that not matter.

I could use exact equality in DFSORT. Meaning a specific position is where the match has to happen. However DFSORT also supports substring search.

Here is the syntax for an exact match condition:


Here I’ve had to remap the ID field to map positions 4 to 6 (offsets 3 to 5). That’s a symbol I don’t really want and it isn’t flexible enough.

Here’s how it would look using a substring search condition:


This is much better as I don’t need an extra symbol definition and the string could be anywhere in the 12 bytes of the CORRID field.

If we can distinguish between Batch (“BTC”) and Dialog (“DIA”) we can do useful things. We can show commits and CPU by time of day – by Batch versus Dialog. We could do Time Of Day anyway, without this distinction. (My DDF SMF 101 Analysis code can go down to 100th of a second granularity – because that’s the SMF time stamp granularity – so I regularly summarise by time of day.) But this distinction allows us to see a Batch Window, or times when Batch is prevalent. If we are trying to understand the operating regime, such distinctions can be handy.

Numeric Suffix

This is the tricky one. Let’s take an example: “XYZBTC083”

We’re talking about the “083” part. It looks like a batch job identifier within a suite. But it isn’t. For a start, such a naming convention would not survive in a busy shop. So what onis it?

There are a few clues:

  • “XYZBTC083” occurs throughout the set of data, day and night. So it’s not a finite-runtime batch job.
  • In the (QWHS) Standard Header the Logical Unit Of Work ID fields for “XYZBTC083” change.
  • the “083” is one value in a contiguous range of suffixes.

What we really have here are SAP Application Server processes, each with their own threads. These threads appear to get renewed every so often. Work (somehow) runs in these processes and, when it goes to Db2, it uses these threads. It’s probably controllable when these threads get terminated and replaced – but I don’t see compelling evidence in the data for that control.

This “083” suffix is interesting: In one SAP application I see a range of “XYZDIA00” – “XYZDIA49”. Then I see “XYZBTC50” – “XYZBTC89”. So, in this example, that’s 50 Dialog processes and 40 Batch processes. So that’s some architectural information right there. What I don’t know is whether lowering the number of processes is an effective workload throttle, nor whether there are other controls in the SAP Application Server layer on threads into Db2. I do know – in other DDF applications – it’s better to queue in the middle tier (or client application) than queue too much in Db2.

IP Addresses And Client Software

Every SMF 101 record has an IP Address (or LU Name). In this case I see a consistent set of a small number of IP addresses. These I consider to be the Application Servers. I also see Linux on X86 64-Bit (“Linux/X8664”) as the client software. I also see it’s level.

So we’re building up a sense of the application landscape, albeit rudimentary. In this case client machines. (Middle tier machines, often – if we’re taking the more general DDF case than SAP.)

Towards Applications With QMDAAPPL

When a client connects to Db2 via DDF it can pass certain identifying strings. One of these shows up in SMF 101 in a 20-byte field – QMDAAPPL.

SAP sets this string, so it’s possible to see quite a high degree of fine detail in what’s coming over the wire. It’s early days in my exploration if this – with my DDF SMF 101 Analysis code – but here are two things I’ve noticed, looking at two SAP applications:

  • Each application has a very few QMDAAPPL values that burn the bulk of the CPU.
  • Each application has a distinctly different (though probably not totally disjoint) set of QMDAAPPL values.

I’ve looked up a few of the names on the web. I’ve seen enough to convince me I could tell what the purpose of a given SAP application is, just from these names. Expect that as a future “stunt”. 🙂


I think I’ve shown you can do useful work – with Db2 Accounting Trace (SMF 101) – in understanding SAP accessing Db2 via DDF.

SAP is different from many other types of DDF work – and you’ve seen evidence of that in this long post.

One final point: SAP work comes in short commits / transactions – which makes it especially difficult for WLM to manage. In this set of data, for instance, there is relatively little scope for period aging. We have to consider other mechanisms – such as

  • Using the Correlation ID structure to separate Batch from Dialog.
  • Using DDF Profiles to manage inbound work.
  • (Shudder) using WLM resource groups.

And, as I mentioned above,

  • Using SAP’s own mechanisms for managing work.

I’ve learnt a fair bit from this customer situation, building as it does on previous ones. Yes, I’m still learning at pace. One day I might even feel competent. 🙂

And it inspires me even more to consider releasing my DDF SMF 101 Analysis code. Stay tuned!

Automating Microsoft Excel

(This post is about automating Excel on Mac OS. If you’re not a Mac user this probably won’t interest you.)

My Process

From the creation of a CSV file onwards, automation is key:

Creating a CSV file is one of two things:

  • A program on z/OS.
  • A piece of javascript code I have that turns HTML tables into CSV.

So that’s not a problem. Where the problem starts is automating Excel, which is what this post is about.

Here is the sequence I generally go through with Excel:

  1. Excel ingests the CSV file.
  2. Then it moves onto the business of creating a chart.
  3. Then resizing the chart.
  4. Then changing some of the attributes of the chart.
  5. Finally exporting the chart in the graphical format that I can use in a presentation.

I orchestrate a lot of things with the Keyboard Maestro automation platform. It’s well worth the money, even if it is your own money. I might kick off a Keyboard Maestro macro in one of several ways:

(As an aside on the “hot key” combination, I’ve repurposed the Caps Lock key to pop up a menu of automations – using Karabiner Elements and a Keyboard Maestro conflict palette.)

ApplesScript And Excel

AppleScript it the prevalent automation language on Mac OS. I have to say the AppleScript support in Excel is very obtuse. So most of the value in this post is a few snippets of AppleScript that people trying to use the data model might find useful.

One tip: If you look at the VBA data model for Excel the AppleScript support is very similar to it, language differences apart.

So here are some snippets of code you might find useful.

Changing A Chart’s Type

To change the currently selected (active) chart’s type to a line chart you would code

tell application "Microsoft Excel"
    set newChart to active chart
    tell newChart
        set chart type to line chart
    end tell
end tell

Note the name “line chart” is not a string. It is literally line chart. I think this is confusing. Other chart types I’ve automated have been

  • xyscatter
  • column stacked
  • column stacked 100

This last is where the y axis stops at 100%.

Editing The Title Of A Chart

The following example does a number of things:

  1. Prompt you for a title for the current (active) chart.
  2. Sets the chart title to the text you returned.
  3. Sets the font size of the title to 24 points.

Here is the code:

tell application "Microsoft Excel"
    set newChart to active chart
    tell chart title of newChart
        set theResponse to display dialog "Edit Chart Title" default answer chart title text as string buttons {"Cancel", "Continue"} default button "Continue"
        set chart title text to text returned of theResponse
        tell its font object
            set font size to 24
        end tell
    end tell
end tell

Setting The Dimensions Of A Chart

I always want the dimensions of a chart to be the same – and suitable for including in a PowerPoint presentation. I have two scripts for setting the dimensions of the active chart:

  • Single Width
  • Double Width

Only the single width one is right for putting in a presentation:

tell application "Microsoft Excel"
    set ch to chart area object of active chart
    set height of ch to 567
    set width of ch to 850.5
end tell

The double width one came in very handy recently: It was a great way to zoom in on a time line:

tell application "Microsoft Excel"
    set ch to chart area object of active chart
    set height of ch to 567
    set width of ch to 1701
end tell

I’ve set these two up on a pair of buttons in Metagrid – so I can readily swap between the two sizes.

In Conclusion

You might wonder why I’ve created a blog post that is essentially code snippets. Here’s why: It took a lot of fiddling, experimentation and web searching to come up with these snippets of code. That they were hard to come up with or find says something.

This post will be findable from Duck Duck Go etc. I hope it saves people some time and frustration.

If you’re not into automation I might be beginning to warm you to it.

I have a lot of Keyboard Maestro macros for Excel that I have yet to convert to pure AppleScript – and these are mainly fiddling with menus. (Keyboard Maestro is very good at that but it is slower and less reliable to automate that way.) As I convert some more I might well post additional code snippets.

mdpre Markdown Preprocessor

mdpre Markdown Preprocessor

A few days ago I released md2pptx, a Markdown to Powerpoint converter on GitHub as an open source project. You can find it here.

Now I’m releasing a companion program – mdpre. This is a preprocessor for Markdown. You can find it here.

I would’ve released them together and why I didn’t is a small story at my own expense: We in IBM have a system for requesting permission to open source software. I put in the cases for both mdpre and md2pptx at the same time. I got the permission email for md2pptx but didn’t spot the one for mdpre. I should have checked the status of the mdpre case but, life being too short, I didn’t get there for another week. Meanwhile I open sourced md2pptx.

All this by way of saying the pairing of mdpre and md2pptx is a strong one: Almost all my presentations are run through mdpre and the resulting Markdown through md2pptx. And one of the features of mdpre is a key reason why I do that.

My Workflow

When I begin a customer study I use a Keyboard Maestro macro that:

  • Creates a folder for the study within the customer’s folder in Documents.
  • Creates subfolders of this one for each presentation within the study.
  • Creates files with the .mdp extension in these folders for each presentation.
  • Pulls in some stock graphics and include files into the relevant subfolders.
  • Creates a make file in each subfolder.

That saves me a lot of time and it’s very consistent. I’m showing you this as it might be a helpful model to follow if you use md2pptx and mdpre.

The Killer Feature For Me

But what is the key feature I mentioned just now?

It’s the ability to convert CSV data into a Markdown table.

For example, you might code


And you get something like this:

A 1
B 2

The default for alignment in Markdown is left. But I don’t always want that, so I invented =colalign:

=colalign l r

This aligns the first column left and the second column right -using the Markdown :- and -: syntax.

There is one further nuance – which would be ignored by a normal Markdown processor: =colwidth:

=colalign l r
=colwidth 2 1

Here the first column is twice the width of the second – but only when rendered on a slide in md2pptx.

Other Notable Features

I less frequently need a few other things:

  • Inclusion of files. The time I use this is when I want to include a CSV file but don’t want to paste it into the main document.
  • Conditional processing. I used this in a couple of my conference presentations.
  • Variables. Again, useful in conference presentations – to create consistency.

As with most things I do, if I get fed up enough I automate it. So I expect there will be additional features added to mdpre over time.

Talking of which, testing, bug reports and even contributions welcome.

md2pptx Markdown To PowerPoint Converter

It’s been a long time since I started writing md2pptx. But finally I’ve open sourced it.

A Problem Statement Of Sorts

I wrote md2pptx because I got tired of four things:

  • The process of embedding graphics in PowerPoint.
  • The fact that pasting a picture into PowerPoint made the resulting presentation file huge.
  • The location of a manually added picture on a page is likely to be inconsistent.
  • Presentations become a hodge podge of inconsistent text styles.

The last one is really the result of success. Take a presentation of mine like “Parallel Sysplex Performance Topics”. It has evolved over at least 10 years. It probably started out in Open Office and ended up in PowerPoint. I know for a fact that each time I “refurbished” it I introduced styling inconsistencies, particularly if I swiped slides from someone else. “Much Ado About CPU” saw that problem in spades.

It also occurred to me that what today is a presentation might tomorrow need to be a document. Or a presentation guide, which is really both a document and slides. If only I could – from the same source – produce both a presentation and a text document.

How Do I Write Documents?

I don’t often write documents but when I do I abjure Word in favour of Markdown. And this is significant.

I rather like a text-based format and, if you forced me to, I’d write in HTML. Thankfully nobody has forced me to write HTML, though I’m very familiar with it. So Markdown (or rather MultiMarkdown) is the text-based format I have chosen.

I write in a wide variety of applications – across Mac, iPad and iPhone. Precisely which needn’t concern us here. What matters is that a text-based format allows to me to write anywhere – and I wanted to extend that to slides.

md2pptx and python-pptx

A while back I discovered a Python package called python-pptx. You can write programs to drive python-pptx to make PowerPoint presentations. Specifically the XML flavour – with file extension “pptx”. Hence the name. It’s actually quite easy – if you’re reasonably adept at Python – to use python-pptx in a program.

So I wrote such a program called md2pptx. I started writing it a few years ago – and I’ve used it in every engagement since. I’ve also refurbished a number of presentations by converting them to Markdown, extracting the graphics, and rebuilding with md2pptx. As I’ve used md2pptx I’ve enhanced it.

md2pptx takes a subset of MultiMarkdown (a superset of “vanilla” Markdown) and builds the presentation, embedding the graphics as appropriate.

If you know Markdown you know how to write in a way that md2pptx can convert.

Everything you would code for md2pptx is valid Markdown so you can turn it into a document – if you need to.

There are some nice wrinkles, such as the automatic creation of a Table Of Contents slide, turning Taskpaper-formatted tasks into task list slides at the end. Likewise a glossary slide. It even supports Critic Markup format – for reviewing presentations.

How This Fits Into My Life

I built a load of automation around preparing for an engagement. I have a Keyboard Maestro macro that creates a new folder for the study (as a subfolder of the one I use for the customer in question). It then creates subfolders – one for each presentation I’d use in a workshop. Inside these folders it writes a stub Markdown file and a make file.

As I develop the presentation I edit it in BBEdit and – on the command line – type make to build the presentation. md2pptx runs very fast, always well under a second to build any presentation I’ve ever used it with. This includes embedding graphics.

I also wrote some automation to cause PowerPoint to exit any slideshow, close the presentation, and reload it. This is a piece of AppleScript I suppose I could share.

Open Sourcing md2pptx

Without giving away any commercial secrets, I can say the process of open sourcing in IBM can be quite straightforward: You submit a proposal, covering points such as licensing, expected IBM effort, commercial considerations. In my case I’ve found getting approval quick – because there’s nothing contentious about it.

So md2pptx is on GitHub. You can find it here. As with my other projects, I invite contributions, issues, testing. The only things I would say are:

  • Don’t expect it to be a highly robust, full function Markdown parser.
  • If you think there is some feature you’d like md2pptx to have consider these two questions:
    • How would you express the semantic in Markdown?
    • How would you express the feature in PowerPoint?

Anyhow, I hope you enjoy using it – if you do. I know I do – and I’m pleased to share it.

What About The Others?

I just produced a new chart, which I think is worth sharing with you.

I produced it for one specific use case, but I think it has slightly wider applicability.

The Trouble With Outsourcers

Quite often we get performance data from outsourcers, whether IBM or some other organisation. Generally they’re running LPARs from more than one of their customers on the one machine.

We have a nice chart for a machine. It shows all the LPARs’ CPU stacked up – with each LPAR a different series. This is fine in a one-company context. But sometimes we are working with the outsourcer and one of their customers. We wouldn’t want to show them the outsourcer’s other customers’ LPARs. But we would want to show them how busy the machine is.

It’s reasonable to show them how busy the machine is because, of course, it affects the performance they’re seeing. And we might well get into LPAR design issues. (A tricky one is the weights because adjusting them is a “robbing Peter to pay Paul” situation – and with a multi customer machine that’s obviously political.)

So here is a new chart, that neatly solves the problem. It’s a real one, though there has had to be a little obfuscation of the names.

In this case CPU2 is a Production LPAR and CPU3 is a Development LPAR. The grey is all the other LPARs’ use of the GCP pool. It’s clearly substantial.

The pool itself isn’t hugely busy – but then this was not said to be a problem day.

But There’s More

Even in the one-company case this chart is useful. Suppose a customer sends us data from what they consider their biggest LPARs. It would be good to show:

  • The LPARs they sent us data for are indeed the bulk of the CPU.


  • We’re missing a big trick as the LPARs they sent data for don’t use the bulk of the CPU.

One Final Plea

I’ve said this many times, but probably not written it in a blog post. Always report processor pools separately. Everything in this post has been for a single machine’s GCP pool. To mix GCP’s with, say, zIIPs makes no sense at all.

Engineering – Part Five – z14 IOPs

I previously wrote about SMT in Born With A Measuring Spoon In It’s Mouth in 2016 – before z14 was announced. I also wrote about it again in 2016 in SMT – Some Actual Graphs. It’s been a year since z15 was announced so enough time has passed for me to want to write about SMT once more.

But actually there isn’t any real SMT news.

But there’s something I thought I’d written about before, but I hadn’t: With z14, IOPs are always enabled for SMT. Actually one of them isn’t, but the rest are. So, in SMF 78–3 you get an odd number of IOPs – and therefore an odd number of IOP Initiative Queue and Utilization Data Sections. One is not SMT-enabled and the rest are.

So, if you have 10 IOP cores you have 19 IOP sections.

It would be interesting to see how they behave. So I took data from a two-drawer z14. (It’s a M02 hardware model, with a software designation 507, with 7 GCPs, 4 zIIPs, and 5 ICFs. It has lots of LPARs.)

So, I used the 78–3 data to plot two metrics:

  • Processed I/O Interrupts per second
  • IOP Busy %

Here is the graph, with IOP Busy on the right-hand axis and I/O Interrupts on the left.

The numbers are interesting but there is no clear pattern:

  • The I/O Interrupt rate varies wildly – and I suspect it has something to do with the devices and channels the IOP is handling.
  • The IOP Busy % doesn’t necessarily correspond to the I/O Interrupt rate.

Probably the more important and useful metric is the IOP Busy number.

When I say “no clear pattern” I mean it would be difficult to say something like “IOP 4 is busier because of its position in the machine”.

I do think it’s worth keeping an eye on IOP Busy %. This particular set of data shows very low IOP utilisations – which is a go thing.

For a 2-drawer z14, 10 IOPs is the standard number but you can buy more. For z13 it was 12 and for z15 it’s 8. there’s a clear trend here. I do think that having SMT as standard on IOPs will have contributed to the possibility of reducing the number of standard IOPs. Obviously them getting a little bit faster with each generation helps, but you have to balance that against other processor types also getting faster. Another factor might be the historical trend towards more memory in a machine and fewer I/Os, relatively speaking.

My code knows that it’s standard for a 2-drawer z14 to have 10 IOPs. It has to calculate – especially from z14 onwards – the number of IOPs as this isn’t recorded. SMT is part of that calculation. So I report standard IOPs and additional IOPs – though I haven’t seen a case of the latter yet.

And this is in the “Engineering” series of blog posts as we’re dealing with individual processors, even if they are IOPs.

filterCSV – 4 Months On

Back in May (2020) I published filterCSV, An Open Source Preprocessor For Mind Mapping Software.

To recap a little, the premise was very simple: I wanted to create a tool that could automate colouring nodes in a mind map, based on simple filtering rules. The format I chose was iThoughts’ CSV file format. (It could both import and export in this format.) Hence the name “filterCSV”.

I chose that format for three reasons:

  • I use iThoughts a lot – and colouring nodes that match patterns is a common thing for me to do.
  • The format is a rather nice text format, with lots of semantics available.
  • Python has good tools for ingesting and emitting CSV files. Likewise for processing arrays – which is, obviously, what CSV can be converted to.

So I built filterCSV and last time I wrote about it I had extended the CSV -> filterCSV -> CSV cycle to

  • Ingest flat text, Markdown and XML
  • Emit HTML, OPML, XML, Freemind

So, it had become a slightly more general tree manipulator and converter.

What Happened Next?

I’ve done a lot of work on filterCSV. I’ll attempt to break it down into categories.


You can now import OPML.


You can now export as tab- or space-indented text.

You can now export in GraphViz Directed Graph format, which means you can get a tree as a picture, outside of a mind-mapping application.

Tree Manipulation Functions

You can sort a node’s children ascending, and you can reverse their order. The latter means you can sort them descending. Imagine a tree with a Db2 subsystem and the CICS regions that attach to it as its children. You’d want the CICS regions sorted, I think. (Possibly by name, possibly by Service Class or Report Class.

Sometimes it makes sense for the children of a node to be merged into the node. Now they can be and they are each preceded by an asterisk – to form a Markdown bulleted list item. (iThoughts can handle some Markdown in its nodes.) I think we might use this in our podcast show notes.

You can now select nodes by their level in the tree. You can also use none as a selector – to deselect all nodes in the tree. (Before you had all as a selector – to allow you to set attributes of all nodes.) You might use none with nc (next colour) to skip a colour in the iThoughts palette.

Here’s an example:

'^A1$' nc
none nc
'A1A' nc

Where the first command says ‘for nodes whose text is “A1” colour with the first colour in the standard iThoughts colour palette’. The second says ‘do not use the second colour in the palette’. The third command says ‘for nodes with “A1A” in their text use the third colour in the palette’.

New Node Attributes

iThoughts, as well as colour and shape, has three attributes of a node that filterCSV now supports:

  • Icon – where you can prefix a node with one of about 100 icons. For example, ticks and single-digit number icons.
  • Progress – which is the percent complete that a task is. Some people use iThoughts for task management.
  • Priority – which can range from 1 to 5.

As with colour and shape, you can set these attributes for selected nodes, with the selection following a rule. And, again, you can combine them. For example a tick node and 100% completion. You can also reset them, for example with noprogress.


Invoking filterCSV with no commands produces some help. This help points to the Github repository and the readme.

You can now (through Stream 3) read commands from a file. If you do you can introduce comments with // . Those continue until the end of the line. You can also use blank lines.

I learnt to use Stream 3 for input you might invoke filterCSV with something like

filterCSV < input.csv > output.opml 3< command-file

So, you can see filterCSV (now at 1.10) has come on in leaps and bounds over the past few months. Most of the improvements were because I personally needed them, but one of them – indented output – was in response to a question from someone in a newsgroup.

And I’ve plenty more ideas of things I want to do with filterCSV. To reiterate, it’s an open source project so you could contribute. filterCSV is available from here.

And it’s interesting to me how the original concrete idea – colouring iThoughts nodes – has turned into the rather more abstract – ingesting trees and emitting them in various formats with lots of manipulations. I like this and probably should deploy the maxim “abstraction takes time and experience”.

Mainframe Performance Topics Podcast Episode 26 “Sounding Board”

In Episode 25 I said it had been a long time since we had recorded anything. That was true for Episode 25, but it certainly wasn’t true for Episode 26. What is true is that it’s taken us a long time from start to finish on this episode, and ever so much has happened along the way.

But we ploughed on and our reward is an Episode 26 whose contents I really like.

On to Episode 27!

Here are the unexpurgated show notes. (The ones in the podcast itself have a length limitation; I’m not sure Marna and I do, though.) 🙂

Episode 26 “Sounding Board”

Here are the show notes for Episode 26 “Sounding Board”. The show is called this because it relates to our Topics topic, and because we recorded the episode partly in the Pougkeepsie recording studio where Martin sounded zen, and partly at home.

Where we have been

  • Marna has been in Fort Worth for SHARE back in February

  • Martin has been to Las Vegas for “Fast Start”, for technical sales training, and he got out into the desert to Valley Of Fire State Park

  • Then, in April he “visited” Nordics customers to talk about
    • zIIP Capacity and Performance
    • So You Don’t Think You’re An Architect?
  • But he didn’t get to go there for real. Because, of course, the world was upended by both Covid and Black Lives Matter.

Follow up

  • Chapter markers, discussed in Episode 16. Marna finally found an Android app that shows them – Podcast Addict. Martin investigated that app, and noted it is available on iOS too.

What’s New – a couple of interesting APARs


    • When you run a workflow step that invokes a job you can automatically save the job output in a location of your choosing files (z/OS Unix file directory).

    • In the same format as you’d see in SDSF . Means users can have an automatic permanent record of the work that was done in a workflow

    • PTF Numbers are UI68359 for 2.3 and UI68360 for 2.4

  • APAR OA56774 (since 2.2) Provides new function to prevent a runaway sysplex application from monopolizing a disproportionate share of CF resources

    • This APAR has a dependency on CFLEVEL 24.

    • This case is pretty rare, but is important when you have it.

    • Not based on CF CPU consumption. Is based on deteriorating service times to other structures – which you could measure with SMF 74–4 Coupling Facility Activity data.

Mainframe – z15 FIXCATs

  • Important to cover as there are many questions about them.

  • IBM.Device.Server.z15–8561.RequiredService

    • Absolute minimum needed to run on a z15

    • Unfortunately some of these PTFs in that list have been involved in messy PE chains

    • If that happens, involve IBM Service (Bypass PE or ++APAR)

    • Usually intent is to keep these PTFs to a minimum – and keep the number of PTFs relatively constant.

      • CORRECTION: System Recovery Boost for z15 GA1 is in Required, not Exploitation category, as the recording states!
  • IBM.Device.Server.z15–8561.Exploitation

    • Needed for optional functions, and you can decide when you want to use them.

    • This PTF list could grow – if we add new functions

  • IBM.Device.Server.z15–8561.RecommendedService

    • This is more confusing. Usually to fix a defect that is found but haven’t risen up to required. We might’ve detected it in testing, or a customer might have.

    • Over time this category probably will grow, as field experience increases

    • Might want to run an SMP/E REPORT MISSINGFIX to see what’s in this FIXCAT. Might install some, all, or none of the fixes. Might want to be more selective. Based on how much change you want to encounter, versus what problems are fixed

  • By the way there are other FIXCATs you might want to be interested in for z15, e.g. IBM.Function.SYSPLEXDataSharing

Performance – DFSORT And Large Memory

  • A very special guest joins us, Dave Betten, former DFSORT performance lead.

  • Follows on from Elpida’s item in Episode 10 “234U” in 2017, and continues the “Managing Large Memory” theme.

  • Number of things to track these days:
    • Often track Average Free
    • Also need to track Minimum Free
    • Fixed frames – Especially Db2, and now with z/OS 2.4 zCX
    • Large frames – Again Db2 but also Java Heap
  • In z/ 2.2
    • OPT controls simplified
      • Thresholds set to Auto
      • Default values changed
      • 64GB versus %
  • In z/ 2.3
    • LFAREA
      • Not reserved anymore but is a maximum
      • BTW the LFAREA value is in SMF 71
      • Dave reminded us of what’s in SMF 71
  • Dave talked about DFSORT memory controls
    • DFSORT has historically been an aggressive user of memory
    • Installation defaults can be used to control that
    • But the EXPOLD parameter needs special care – because of what constitutes “old pages”, which aren’t actually unused.
    • DFSORT Tuning Guide, especially Chapter 3
  • Dave talked about how handy rustling up RMF Overview Reports can be, with several Overview conditions related to memory.

  • Most of the information in this topic is relevant to LPARs of all sizes

Topics – Update on recording techniques

  • Last talked about this in Episode 12, December 2017

  • Planning for podcast – still using iThoughts for outlining the episode (though its prime purpose is for mind mapping and Martin (ab)uses it for depicting various topologies.

  • Recording of podcast – still using Skype to collaborate
    • Record locally on each side, but now Marna’s side is in the new Poughkeepsie recording studio!
    • Martin has moved to Piezo and then Audio Hijack
      • Recording in stereo, with a microphone stand to minimise bumps
      • Has to slow the computer’s fan speed, and has an external cooling pad
      • Also he hides behind pillows to minimise the noise and improve audio quality.
    • For a guest, it’s different. We can’t record in stereo. Guests might not have recording software. But still use Skype (unless in Poughkeepsie).
  • Production

    • Martin’s editing

      • Moved from Audacity on Mac to Ferrite on iPad OS
      • Moved to iPad so he can edit anywhere, except where there is noise. Apple Pencil helps with precision.
      • Then, throw away remote side – in stereo terms.
      • Then, perform noise reduction, still not perfect.
  • Publishing

    • Marna’s publishing: Uploading the audio, publishing show notes, still the same as before.

Customer requirements

  • – Insert Usual Disclaimer Here – which is only our thoughts.

  • RFE 139477 “Please include the CPU Time Limit for a Job/Step in SMF Type 30”

    • The CPU Time Limit in effect for a JobStep is not currently written to SMF Type30 at the end of the step.

      While a job is running this information is available in the Address Space Control Block (ASCBJSTL) and can be displayed or even modified by tools such as OMEGAMON.

      However the information is not retained after the JobStep completes. This information would be very useful after the fact to see the CPU time limit in effect for a JobStep.

      This enhancement request is to include the information in ASCBJSTL in the SMF Type30 Subtype 4 record written at the end of the JobStep.

      An additional consideration would be how to best deal with the Job CPU time Limit (as specified on the JOB statement) and whether this can also be catered for in the RFE

    • Business justification: Our site got caught out by a Test job being submitted overnight with TIME=1440 and consuming over 6 hours CPU before it was cancelled. We would like to be able to prevent similar issues in future by having the CPU Time Limit data available in SMF.

    • Our comments:

      • After the fact
        • The RFE was calling for “after the fact”. i.e. when the step has ended. Might also like the source of the limit.

        • End of step looks useful. Could run query comparing to actual CPU time, then track to see if ABEND is on the horizon

      • “As it happens”

        • Would like on the SMF Interval as well as Step End records, maybe with tools to dynamically change the parameters.

        • May not need the SMF information if vendor and IBM tools already do it today, making it perhaps not a high enough priority for SMF

        • And the source of the parameters might not be readily available in control blocks so this might not even be feasible.

On the blog

Contacting Us

You can reach Marna on Twitter as mwalle and by email.

You can reach Martin on Twitter as martinpacker and by email.

Or you can leave a comment below. So it goes…

Engineering – Part Four – Activating And Deactivating LPARs Causes HiperDispatch Adjustments

(This post follows on from Engineering – Part Two – Non-Integer Weights Are A Thing, rather than Engineering – Part Three – Whack A zIIP).

I was wondering why my HiperDispatch calculations weren’t working. As usual, I started with the assumption my code was broken. My code consists of two main parts:

  • Code to build a database from the raw SMF.
  • Code to report against that database.

(When I say “my code” I usually say “I stand on the shoulders of giants” but after all these years I should probably take responsibility for it.) 🙂

Given that split the process of debugging is the following:

  1. Check the reporting code is doing the right thing with what it finds in the database.
  2. Check the database accurately captures what was in the SMF records.

Only when those two checks have passed should I suspect the data.

Building the database itself consists of two sub stages:

  1. Building log tables from the raw records.
  2. Summarising those log tables into summary tables. For example, averaging over an hour.

If there is an error in database build it is often incorrect summarisation.

In this case the database accurately reports what’s in the SMF data. So it’s the reality that’s wrong. 🙂

A Very Brief Summary Of HiperDispatch

Actually this is a small subset of what HiperDispatch is doing, sufficient for the point of this post.

With HiperDispatch the PR/SM weights for an LPAR are distributed unevenly (and I’m going to simplify to a single pool):

  1. If the LPAR’s overall weight allows it, some number of logical processors receive “full engine” weights. These are called Vertical Highs (or VH’s for short). For small LPARs there could well be none of these.
  2. The remainder of the LPAR’s weight is distributed over one or two Vertical Mediums (or VM’s for short).
  3. Any remaining online logical processors receive no weight and are called Vertical Lows (or VL’s for short).

Enigma And Variations

It’s easy to calculate what a full engine’s weight for a pool is: Divide the sum of the LPARs’ weights for the pool by the number of shared physical processors. You would expect a VH logical processor to have precisely this weight.

But what could cause the result if this calculation to vary. Here the maths is simple but the real world behaviours are interesting:

  • The number of physical processors could vary. For example, On-Off Capacity On Demand could add processors and later take them away.
  • The total of the weights for the LPARs in the pool could vary.

The latter is what happened in this case: the customer deactivated two LPARs on a machine – to free up capacity for other LPARs to handle a workload surge. Later on they re-enabled the LPARs, IPLing them. I’m not 100% certain but it seems pretty clear to me that IPLing doesn’t cause the LPAR’s weights to come out of the equation; I’m pretty sure IPLing doesn’t affect the weights.

These were two very small LPARs with 2–3% of the overall pool’s weights each. But they caused the above calculation to yield varying results:

  • The “full engine” weight varied – decreasing when the LPARs were down and increasing when they were up.
  • There was some movement of logical processors between VH and VM categories.

The effects were small. Sometimes a larger effect is easier to debug than a smaller one. For one, it’s less likely to be a subtle rounding or accuracy error.

The conversion of VH’s to VM’s (and back) has a “real world” effect: A VH logical processor is always dispatched on the same physical processor. the same is not so true for a VM. While there is a strong preference for redispatch on the same physical, it’s not guaranteed. And this matters because the cache effectiveness is reduced when a logical processor moves to a different physical processor.

So, one recommendation ought to be: If you are going to deactivate an LPAR recalculate the weights for the remaining ones. Likewise, when activating, recalculate the weights. In reality this is more a “playbook” thing where activation and deactivation is automated, with weight adjustments built in to the automation. Having said that, this is a “counsel of perfection” as not all scenarios can be predicted in advance.

What I Learnt And What I Need To Do

As for my code, it contains a mixture of static reports and dynamic ones. The latter are essentially graphs or the makings of – such as CSV files.

Assuming I’ve done my job right – and I do take great care over this – the dynamic reports can handle changes through time. So no problem there.

What’s more difficult is the static reporting. So, one of my key reports is a shift-level view of the LPAR layout of a machine. In the example I’ve given, it had a hard time getting it right. For example, the weights for individual LPARs’ VH processors go wrong. (The weight of a full processor worked in this case – but only because the total pool weight and number of physical engines didn’t change. Which isn’t always the case.)

To improve the static reporting I could report ranges of values – but that gets hard to consume and, besides, just tells you things vary but not when and how. The answer lies somewhere in the region of knowing when the static report is wrong and then turning to a dynamic view.

In particular, I need to augment my pool-level time-of-day graphs with a stack of the LPARs’ weights. This would help in at least two ways:

  • It would show when weights were adjusted – perhaps shifting from one LPAR to another.
  • It would show when LPARs were activated and de-activated.

A design consideration is whether the weights should stack up to 100%. I’ve come to the conclusion they shouldn’t – so I can see when the overall pool’s weight changes. That reveals more structure – and I’m all for not throwing away structure.

Here’s what such a graph might look like:

In this spreadsheet-driven mockup I’ve ensured the “now you see them now you don’t” LPARs are at the top of the stack.

I don’t know when I will get to this in Production code. As now is a particularly busy time with customer studies I probably should add it to my to-do list. But I’ll probably do it now anyway… 🙂

Head Scratching Time

In this set of data there was another phenomenon that confused me.

One LPAR had twelve GCPs online. In some intervals something slightly odd was happening. Here’s an example, from a single interval:

  • Logical Processors 0–4 had polar weights (from SMF70POW as calculated in Engineering – Part Two – Non-Integer Weights Are A Thing) of 68.9. (In fact there was a very slight variation between them.)
  • Logical Processor 5 had a polar weight of 52.9.
  • Logical Processor 6 had a polar weight of 12.6.
  • Logical Processors 7 to 11 had polar weights of 0.

If you tot up the polar weights you get 410 – which checks out as it’s the LPAR’s weight in the GCP pool (obtained from other fields in the SMF 70 record).

Obviously Logical Processors 0, 1, 2, 3, and 4 are Vertical High (VH) processors – and bits 0,1 of SMF70POF are indeed “11”.

But that leaves two logical processors – 5 and 6 with non-zero, non-VH weights. But they don’t have the same weight. This is not supposed to be the case.

Examining their SMF70POF fields I see:

  • Logical Processor 5 has bits 0,1 set to “10” – which means Vertical Medium (VM).
  • Logical Processor 6 has bits 0,1 set to “01” – which means Vertical Low (VL).

But if Logical Processor 6 is a VL it should have no vertical weight at all.

Well, there is another bit in SMF70POF – Bit 2. The description for that is “Polarization indication changed during interval”. (I would’ve stuck a “the” in there but nevermind.)

This bit was set on for LP 6. So the LP became a Vertical Low at some point in the interval, having been something else (indeterminable) at some other point(s). I would surmise VL was its state at the end of the interval.

So, how does this explain it having a small but non-zero weight? It turns out SMF70POW is an accumulation of sampled polar weight values, which is why (as I explained in Part Two) you divide by the number of samples (SMF70DSA) to get the average polar weight. So, some of the interval it was a VM, accumulating. And some of the interval it was a VL, not accumulating.

Mystery solved. And Bit 2 of SMF70POF is something I’ll pay more attention to in the future. (Bits 0 and 1 already feature heavily in our analysis.)

This shifting between a VM and a VL could well be caused by the total pool weight changing – as described near the beginning of this post.


The moral of the tale is that if something looks strange in your reporting you might – if you dig deep enough – see some finer structure (than if you just ignore it or rely on someone else to sort it out).

The other, more technical point, is that if almost anything changes in PR/SM terms – it can affect how HiperDispatch behaves and that could cause RMF SMF 70–1 data to behave oddly.

The words “rely on someone else to sort it out” don’t really work for me: The code’s mine now, I am my own escalation route, and the giants whose shoulders I stand on are long since retired. And, above all, this is still fun.