Mainframe, Performance, Topics

More AppleScript And Excel

In Automating Microsoft Excel I wrote about some basic manipulation of graphs in Excel for Mac OS, using AppleScript.

I’m writing about it again because of the paucity of examples on the web.

Here is an example that shows how to do a number of things in that vein.

tell application "Microsoft Excel"
    set c to active chart
    tell c

        set xAxis to (get axis c axis type category axis)
        tell xAxis
            set has title to true
        end tell
        set tx to axis title of xAxis
        set axis title text of tx to "Drawer / DCM / Chip"
        set font size of (font object of tx) to 14

        set yAxis to (get axis c axis type value axis)
        tell yAxis
            set has title to true
        end tell
        set ty to axis title of yAxis
        set axis title text of ty to "Core Count"
        set font size of (font object of ty) to 14

        set fore color of fill format of chart format of series "VH GCPs" to {0, 0, 255}


        set fore color of fill format of chart format of series "VM GCPs" to {51, 153, 255}

        set fore color of fill format of chart format of series "VL GCPs" to {185, 236, 255}
        set fore color of fill format of chart format of series "VL Unparked GCPs" to {185, 236, 255}
        set fore color of fill format of chart format of series "VL Parked GCPs" to {185, 236, 255}

        chart patterned series "VL Parked GCPs" pattern dark horizontal pattern

        tell series "Offline GCPs"
            set foreground scheme color of chart fill format object to 2
            set line style of its border to continuous
            set weight of its border to border weight medium
            set color of its border to {0, 0, 255}

        end tell

        set fore color of fill format of chart format of series "VH zIIPs" to {0, 255, 0}
        set fore color of fill format of chart format of series "VM zIIPs" to {96, 255, 180}
        set fore color of fill format of chart format of series "VL zIIPs" to {185, 255, 236}
        set fore color of fill format of chart format of series "VL Unparked zIIPs" to {185, 255, 236}
        set fore color of fill format of chart format of series "VL Parked zIIPs" to {185, 255, 236}

        chart patterned series "VL Parked zIIPs" pattern dark vertical pattern


        tell series "Offline zIIPs"
            set foreground scheme color of chart fill format object to 2
            set line style of its border to continuous
            set weight of its border to border weight medium
            set color of its border to {0, 255, 0}
        end tell

    end tell
end tell

The above is the code I use to colour my drawer-level graphs.

Let me extract pieces that you might to use. (My assumption is that anybody reading this far got here because they came for AppleScript / Excel tips.)

Addressing The Active Chart

Everything in this post assumes you have selected a chart (graph) and want to manipulate it. The other snippets will need to be wrapped in this.

tell application "Microsoft Excel"
    set c to active chart
    tell c
        ...
    end tell
end tell

The point is to tell the active chart object what to do.

Manipulating Chart Axis Titles

First let’s manipulate the title of the x axis. Excel calls this the category axis.

The following

Sets xAxis to the category axis of our chart.
Turns on the title for the axis.
Sets tx to the axis title object.
Sets its text to “Drawer / DCM / Chip”
Sets its font size to 14 points

set xAxis to (get axis c axis type category axis)
tell xAxis
    set has title to true
end tell
set tx to axis title of xAxis
set axis title text of tx to "Drawer / DCM / Chip"
set font size of (font object of tx) to 14

Now the y axis title:

set yAxis to (get axis c axis type value axis)
tell yAxis
    set has title to true
end tell
set ty to axis title of yAxis
set axis title text of ty to "Core Count"
set font size of (font object of ty) to 14

The only real difference is we’re setting yAxis to what Excel calls the value axis.

Setting The Foreground Fill Colour For A Series

This one turned out to be quite difficult to figure out.

You address a series in a chart by its legend value: series "VH GCPs.
You get the chart format of the series: chart format.
You get its fill format: fill format.
You set its foreground colour (for color a) using RGB (Red/Green/Blue) values. In this case full-on blue:`

The brackets in that last denote a list of values.

Put it all together and you get this:

set fore color of fill format of chart format of series "VH GCPs" to {0, 0, 255}

The main snippet has several of these.

Setting A Series Pattern

This one took a while to figure out.

chart patterned series "VL Parked GCPs" pattern dark horizontal pattern

It’s actually a command. Again the series is referred to as series "VL Parked GCPs".

I wanted a horizontally striped pattern so I chose pattern dark horizontal pattern. For another series I chose pattern dark vertical pattern.

Setting The Box Surround For A Series

I wanted some empty-looking series. Before I discovered the chart patterned series command I wrote the following.

tell series "Offline zIIPs"
    set foreground scheme color of chart fill format object to 2
    set line style of its border to continuous
    set weight of its border to border weight medium
    set color of its border to {0, 255, 0}
end tell

(The set foreground scheme color incantation uses the current Excel file’s scheme colour 2 – which happened to be white. I discovered this before discovered set for color of fill format.... I wasn’t happy with the colour control scheme colours give, so I persisted with being able to specify RGB values.)

The elements I want to draw your attention to here are around setting the border:

You can set the border line style with set line style of its border to ....
You can set the weight of its border with set weight of its border to .... I found the standard border width a bit weedy.
You can set the colour of its border with set color of its border to and here I’ve specified an RGB value.

Conclusion

It took a lot of experimenting to gain the above techniques – which is why I wanted to share them. I will say Script Debugger (which is a purchasable app) helped a lot, especially with seeing the attributes of objects such as the axis object. It does a nicer job of formatting Excel’s AppleScript dictionary than the built in Script Editor.

No doubt I’ll find more techniques – as I stretch what my code can do. If I do I’ll share them.

And now I’m happy knowing I’ve automated much of the drudgery of making charts in Excel.

Making Of

This is the second blog post I wrote today in a plane coming back from Istanbul. Some of the code was worked on during down time between customer meetings.

I can recommend writing under such circumstances, even if they’re a bit cramped. The one downside is my reluctance to pay for in-flight wifi. But I contend blog posts benefit from “sleep on it” so there’s no urgency to posting.

Relating Parked Time To Cores

In Drawers And More I mentioned Parking with the words

One day I’ll be able to colour VL’s according to whether they are parked or not – or rather fractions of the interval they’re parked. That’s a not very difficult fix for the python code.

Well, that proved as simple as I thought. So, along the way I built on the Parked Time numbers to get a Core-Level view of parking. The point of this post is to tell you how to do it – using SMF 70 Subtype 1. There are two challenges – but they’re not big ones:

Relating threads to cores
Calculating core level parking and unparking

Relating Cores To Threads

If you don’t have SMT-2 enabled for zIIPs it’s all very simple: The threads map one to one onto cores, in the right order.

SMT-2 is the more interesting case – and as the same techniques can be used for non-SMT as SMT it’s probably best to use the SMT case anyway; You might enable SMT one day.

Parked Time is recorded in the CPU Data Section – with one section per thread (CPU), not per core. The field is SMF70PAT. As I noted in Drawers And More these are only recorded for the record-cutting LPAR.

The Logical Processor Data Section has one section for each core for all LPARs on the machine.

To make sense of how a logical processor behaves – particularly its parking behaviour – you need to relate the two.

Fortunately, when SMT-2 was introduced RMF added a Core Data Section. Again, this is only for the record-cutting LPAR. But that suffices for our purposes.

This section has, among other things, the following information:

The Core ID
The number of CPU Data Sections to skip over to get to the first one for this core.
The number of CPU Data Sections for this core.

My code loops over the Core Data Sections, picking up these three fields. The second field points to the first CPU Data Section for this core.

Whether the number of CPU Data Sections for this core is 1 or 2, the technique for handling Parked Time is the same.

Calculating Core Level Parking

I’m interested in how much of the RMF interval the core is parked and how much it is unparked.

First, I need the RMF Interval Duration. While it usually is 15 minutes (Or 900 seconds) it often isn’t:

Some customers choose a different interval, perhaps 10 or even 5 minutes. \ For benchmarking you might even choose 1 minute.
There are other sets of circumstances where the interval is cut short, such as starting and ending System Recovery Boost boost periods.

SMF70INT is the field that gives the RMF Interval Duration.

I work in seconds, by the way. Converting SMF70INT to seconds, and STCK values to seconds isn’t difficult in REXX (or Assembler).

Having listed the CPU Data Sections for a core I sum up the Parked Time for each such section and divide by the number of sections. This gives the average parked time for each thread of the core.

I’ve seen very slight differences between the parked times for each thread of a SMT-2 core – though my test data consists entirely of “all parked” or “all unparked” cores. These slight differences make me want to average rather than assuming the two threads have the same parked time. In any case my CSV file records both threads’ parked times.

I divide the parked time by the RMF interval duration to get a parked time fraction. I obviously can subtract that from 1 to get the unparked time fraction.

Conclusion

Plotting Parked Fraction and Unparked Fraction as two series, instead of just counting the Vertical Low (VL) cores did prove useful. It visually confirmed the VL’s off in another drawer were parked for the whole of the interval. So no actual cross-drawer behaviour occurred.

And now I can show it to a customer.

If I do get to open source this code all the above will have been done in the code.

And it does – as SMF70PAT is only available for the record cutting LPAR – reinforce how essential it is to cut SMF 70-1 records on all z/OS LPARs.

Making Of

I started the week writing a blog post on the way to Istanbul. Talking to my Turkish customers reinforced my view this stuff has value – and how lucky we are that z16 introduced it. (It also made me think I need to explain drawers and cache hierarchy better in every engagement where I discuss CPU. So I’m going to inject some educational material into my workshop slides.) So here I am on a plane back from Istanbul writing this follow-on piece – as I was inspired to make further progress on this project.

My open-minded view is my journey with this data will continue…

Drawers And More

Late last year I wrote a blog post: Drawers, Of Course. I’ll admit I’d half forgotten about it. Now that a few months have passed it’s time to write about at least part of it again.

So why write about it again now?

I’ve so much more experience with the instrumentation I described in that post.
My tooling has come on in leaps and bounds.

You’d think the two were related, and I suppose they are. But these two points give me the structure for this post.

Experience

Here I’m primarily concerned with learning how the data behaves – and what it shows us about machines’ behaviour.

My customer set has expanded, of course. Notable new data points are single drawer machines (Max39 for z16) and the other kind of four drawer machine (Max168). Both of these lead to different thoughts:

One of the single drawer machines really should become a two-drawer machine when it’s upgraded from z15 to z16. Actually the one huge LPAR should be split before that happens.
The Max168 obviously has a lower maximum number of characterisable engines in a drawer, compared to the Max200.

Some of the experience gain, though, is best left to when I talk aboutTooling. I’ve evolved my thinking about storytelling in this area – basically by doing it the hard way i.e. without adequate tools.

Sometimes, though, seeing whether a line of discussion even works in customer engagements is valuable. This time is no exception; This is indeed of value to customers. Even if, as in one case, I told a customer “all your LPARs fit very nicely in one drawer; Indeed most fit in one Dual Chip Module (DCM)”.

And some situations have seen me say “fine with the workload at the level it is but growth could easily take you beyond a drawer so don’t assume future processors will bail you out”.

Tooling

One of the reasons I didn’t show pictures in Drawers, Of Course is I wasn’t happy with the tooling I had. To be fair it was a first go with the new instrumentation so it was

Cumbersome to operate
Unable to tell a consumable story

In the intervening months I’ve made progress on both of those. So now is the time to share some of the fruits of that development work.

What I had then is some REXX for reading in SMF 70-1 records and writing one line per Logical Processor Data Section – as a CSV. I did at least have a sortable timestamp.

This produces a big file that requires you to cut and paste a given LPAR’s logical processors for a specific RMF interval into another CSV file – for a point-in-time view. And that cutting and pasting required a manual search for the lines you want. And a longitudinal view of a core was a pain to generate.

It also didn’t show offline processors. So, from interval to interval, a logical processor might come and go in its entirety. Not good for graphing. And not good for spotting it going on and offline.

So I wrote a shell script (“filterLogicals”) to process the big CSV file. It uses egrep to do one of two things:

Extract the lines for a given LPAR (or “PHYSICAL”) for a specified RMF interval.
Extract the lines for a given logical engine (for a single LPAR) for all intervals.

These two capabilities unlock some possibilities.

Having got the lines for a given LPAR (or PHYSICAL) there remains the problem of graphing or diagramming.

My current opinion is that you don’t really care which logical processor has its home address on which drawer / Dual Chip Module / chip. Mostly you care about how many of each type and polarity have their home addresses on each drawer / DCM / chip. My experience is that level of summary tells a nice story really quite well.

So I wrote a piece of python (“logicals” – which I shall probably rename) which takes the single interval output and summarises it as a CSV ready for Excel to turn into a graph. It can tell the difference between two cases:

Physical processors (from “PHYSICAL”).
Logical processors – for any other LPAR.

I’m still convinced that the so-called home addresses for “PHYSICAL” are actually where the physical cores are. Here’s an example

You can see

The GCPs and zIIPs are in Drawers 1 and 2.
The ICFs are in Drawer 4. (If there were IFLs I’d expect them to be in the higher drawers, too.)
Drawer 1 has 39 characterised cores (which is the limit on this machine).
Drawer 2 has 36 characterised cores.
Drawer 3 has 10 – and that is where one of the LPARs sits.
Drawer 4 has 16.

I separate the DCMs by a single line and the drawers by three lines in the CSV.

I don’t attempt at this stage to combine all the LPARs for a machine. One day I might; I’d like to.

A design decision I made was to colour the GCPs blue and the zIIPs green. Vertical Highs (VHs) are dark, Vertical Mediums (VMs) medium 😃, Vertical Lows (VLs) light. Offline processors are not filled but the bounding box is the right colour.

Fiddling with colours like that is a pain in the neck. After quite a lot of effort I got AppleScript working to find the series and colour them appropriately. The result is this:

This is a real customer LPAR – from the same machine as the previous diagram. Notice how the offlines have home addresses. I will tell you, from experience, that when those processors are online they are likely to have different home addresses. It’s also interesting that there are 9 offlines on one chip – when there are only 8 cores on the chip; They can’t all be there when they come online.

The result of my effort is four pieces of code that make it very easy to see the home addresses for any LPAR at any instant in time. But I’ll keep fiddling with them. 😃

My REXX code now produces columns with Parked Time – for however many threads there are. But this can only be done for the record-cutting LPAR. Which reinforces the view you should cut SMF 70-1 records for all z/OS LPARs on a machine. HiperDispatch Parking is a z/OS thing, whereas HiperDispatch Polarity is a PR/SM thing.

One day I’ll be able to colour VL’s according to whether they are parked or not – or rather fractions of the interval they’re parked. That’s a not very difficult fix for the python code. That will show me, for instance, that those two VL’s off in another drawer are actually 100&percnt; parked (which they in fact are) – wherever in the machine they are dispatched.

Conclusion

There are a number of us looking at the new SMF 70-1 instrumentation. We tend to look at it in different ways, with different ways of diagramming and graphing.

I’ve described my tooling in a bit of detail because it’s actually in the kind of shape I could ship as part of a new open source project. No promises that I’d ever be allowed to do that but I’d be interested to know if this is something that people would find useful.

I should also mention that SMF 113 is a very good companion to SMF 70-1. Again, keep it on for all z/OS LPARs.

I’ve also laid a trap for myself; I wonder if you can spot my “hostage to fortune”. 😀

Making Of

I started writing Drawers, Of Course on a plane to Istanbul. Here I am again, doing exactly the same thing. And I know jolly well the topic of drawers will come up with this other customer’s workshop. It rarely doesn’t these days.

And I think so much of the topic of drawers that “Drawers, Of Course” has grown into a presentation I’ll be giving at GSE UK Virtual Conference, 23-25 April. By the way, we have an exciting stream of presentations on the z Performance & Capacity stream there.

One other point: You might ask “why so many bites at / of the cherry?” It’s my privilege in this blog to write about stuff as I learn. It might be embarrassing if something I write later on contradicted something I earlier said. But that’s a rarity and not that embarrassing. The point is as I learn (through book or experience) I share. Maybe one day I will consolidate it all into an actual book. I just don’t have the time or structure yet. And that might not be the right medium.

If you’ve made it this far you might appreciate that “Drawers, Of Course” could have been better as “Drawers, Of Cores”. 😀 Oh well.

Drawers, Of Course

This post is about processor drawers and how the topic might influence your LPAR design.

Introduction

Once upon a time drawers and books were very simple. If you wanted a certain number of processors – whether GCP, zIIP, zAAP, IFL, or ICF – that determined the number of drawers you had. (I’m still hearing people refer to them as books, even though that went out when we went from vertical orientation to horizontal.¹)

Now it’s got more complex. I think this might really have taken off with z15 – but it’s certainly a feature of z16 as well. And I expect it to remain a feature – though I’m not saying anything specific about any future processor ranges.

I’m seeing quite a few Max200 machines – even though the characterised processor count is nowhere near that number. I’m also seeing Max82 machines. Again, this isn’t usually driven by processor counts. (And just because I don’t list the other models doesn’t mean they’re not out there; My sample is limited.)

So this post discusses why. And it’s drawn from the increasingly frequent discussions I have with customers about their, er, drawers. ² 😃

z16 Drawers

This post might seem quaint when future processors come out. But in 2023 it’s “up to the moment”. And, to keep it simple(r), I’m not going to talk about A02 machines.

We designate z16 A01 machines as Max39, Max82, Max125, Max168, and Max200. These relate to the maximum number of characterisable physical processors. The top 2 models are 4 drawer machines. Max39 has 1 drawer, Max82 has 2, Max125 has 3.

I should perhaps explain that the term “characterisable physical processors” refers to the GCPs, zIIPs, IFLs, and ICFs a customer has bought – on the machine. (Technically it refers to the IFP and SAPs also, but this post isn’t really about those.)

A drawer contains, among other things, processors, memory, and sockets for ICA-SR coupling facility links. Apart from the processors RMF doesn’t surface any of this. And indeed I infer drawer count from the maximum number of characterisable processors (field in SMF 70-1).

More Drawers For Greater Resilience

One of the main reasons for having more drawers is to increase resilience. Let me explain a couple of reasons why this might be so. And bear in mind the events I describe are very rare but well worth planning for.

Losing A Drawer

If you condition the machine correctly the need to replace a drawer need not stop a machine. If you have spare capacity in other drawers the purchased processors can move and the LPARs’ logicals along with it. I also understand that memory – if there is sufficient physically in the surviving drawers – can be “moved” to replace that from the offgoing drawer.

One relatively happy circumstance in which a drawer might need disabling could be for adding physical memory. (I don’t know if adding ICA-SR connections requires this.)

Obviously a single-drawer machine can’t participate in concurrent drawer removal. Further, the remaining drawers – in extremis – might get crowded.

But it’s certainly something to plan ahead for – if you can.

Recovering To Another Machine

In a two (or more) machine configuration you might hope to survive a machine-level outage by recovering to the surviving machine(s).

If there weren’t spare capacity, possibly using a generally unpopulated drawer, recovery to survivors might not be feasible.

Plenty of customers have machines with many uncharacterised cores, often in their own drawers. And such drawers would be expected to have memory.

More Drawers For Separation

PR/SM’s algorithms for LPAR and logical processor and memory placement tries to separate ICF and IFL LPARs from z/OS LPARs:

It tries to place the GCPs and zIIPs in the bottom drawers, working upwards. z/OS memory also.
It tries to place the IFLs and ICFs in the top drawers, working downwards. Their memory also.

It works better if the z/OS LPARs are kept separate from the others, especially not sharing Dual Chip Modules (DCMs).

With a single drawer that can’t be done.

If there are either so many z/OS logical processors or IFLs and ICFs that they can’t be separated PR/SM can’t achieve the ideal. This is not just a single-drawer problem.

More Drawers For Scalabiity

PR/SM tries to keep the logical processors and memory for an LPAR in the same drawer. If an LPAR grows too big it might not be possible to keep it in a single drawer. If that happens there will be cross-drawer memory and cross-drawer (virtual) level 4 cache accesses. These cost many more cycles than in-drawer accesses. So drawer crossing is best avoided.

Where LPARs get sufficiently large it might very well be better to split the LPAR. Whether each LPAR ends up in its own drawer will depend on their sizes. If two LPARs would between them be too large for a single drawer you’d hope they ended up in separate drawers.

Yes, there could well be more CPU cost – perhaps because of Db2 datasharing scaling out. But there’s a resilience benefit – in that more LPARs sharing a given workload tend to have better resilience characteristics.

I have observed cases where 2 LPARs in a sysplex fit in Drawer 1. PR/SM is observed – using the instrumentation I’m about to describe – to indeed place both of them in Drawer 1. In one case – with a Max82 (2 drawer) machine nothing ended up in Drawer 2. This is by design.

What Is In Each Drawer?

With z16 SMF 70-1 learnt a new trick (and I along with it). Prior to z16 you could get the home addresses of logical processors from SMF 99-14. But

It only gave you the information for the LPAR whose RMF cut the records.
It only told you about z/OS LPARs.
It gave you no information about physical processor locations.

With z16 and the appropriate RMF support you now get the home addresses for all logical processors for all LPARs, no matter what the LPAR is. (Including IFLs and ICFs.) What you don’t get – which SMF 99-14 has – is affinity nodes. Perhaps you can guess that from processors that behave like each other, but it’s only a small pity anyway.³

I also think the home addresses for the PHYSICAL LPAR have significance: In the data I’ve processed these look very much like the physical locations of the characterised processors. But I’ve not seen this written anywhere – so maybe it can’t be relied on. Certainly the home addresses of the LPARs’ logical processors never stray outside of PHYSICAL’s home addresses. And their number corresponds – as it always has – to what is purchased (also given in SMF 70-1 but in a different section).

Neither SMF 70-1 nor 99-14 will tell you where an LPAR’s memory is. So I’d be especially careful of LPARs’ whose memory footprint might approach the average drawer’s memory.

One point to remember is that a logical processor home address is not necessarily where it will be dispatched. For Vertical High’s (VH’s) it is. For Vertical Mediums (VM’s) rather less so as they have to share physical processors. For VL’s still less so. But, as I said, even a VL can’t be dispatched outside of the physical cores of a given type.

Conclusion

I wanted to sensitise some of you to the question of “how many drawers should a machine have?” And “why?”

I also wanted to introduce (nearly all of) you to the new instrumentation in 70-1. This really changes the processor analysis game.

I haven’t necessarily covered all the aspects of this topic, of course. For that a good place to start is this Redbook. I found page 112 onwards a good read.

I also realise that drawers aren’t cost free. I also was confronted with the fact that the number of drawers a machine can have is limited by the number of frames. Further, that the bigger machines are factory build only.

Still, I hope this has been food for thought. And I expect to have even more discussions about drawers with customers going forward.

One other point: The first drawer is said to have fewer characterisable cores than subsequent ones. It seems cautious to assume any drawer’s size to be the smallest one in the machine, not just the first’s. So, for z16 that would be 39. In any case you don’t want to get too close to a full drawer.

One final thought: You can’t predict which logical (and physical) processors will end up in which drawers. You can only design sensibly and verify with 70-1 and check the effects with SMF 113. In fact the theme running through this post is indeed “design sensibly”.

Making Of

This post was started on a plane to Istanbul – to run a customer workshop. Without giving anything away at all, I can say that drawers were a topic of conversation. One of many. And this is far from the only customer where the topic has come up recently. And the post was concluded on the way back.

If you’ll pardon the pun, this post draws on my experiences using code to analyse the new SMF 70-1 fields. What I haven’t yet done is updated my diagramming code to use this data. Perhaps it’s time I did. Actually, I returned to this post a couple of weeks later – before publishing. I have some thoughts on diagramming – which could only be done with z16 SMF 70-1. As I prototype and then refine I probably will write another post.

One working title for this post Was “Er, In Drawers”. I suspect the pun is a Britishism and probably should be retired.

Once upon a time, if you said “book” when the correct term was “drawer” I might’ve been churlish enough to say “you mean drawer”. Nowadays I hope I just use the term “drawer” in subsequent sentences. A “bit passive aggressive” but not so “active aggressive” as before. ↩
If you’ll forgive a definite Britishism in, perhaps, poor taste. ↩
I’m bound to be proven wrong on this one. 😃 ↩

Mainframe Performance Topics Podcast Episode 34 “Homeward Bound”

We started planning this one quite a while ago. Thankfully our topics tend to be evergreen – in that they’re still topical for quite a while. In that vein I know we are gaining new listeners and they aren’t all starting with the latest episode.

Anyway, our schedules have been their usual hectic selves – but in a good way.

Actually, recording happened over quite a short timespan – when we got to it.

So, enjoy!

Episode 34 “Homeward Bound” long show notes.

This episode is about our Performance Topic.

Since our last episode, Martin was in Istanbul twice, Copenhagan, and Nottingham. Martin and Marna were both in GSE UK, which was the best GSE UK ever! Marna was in GSE Germany, which was also a fabulous event with a great technical agenda.

Mainframe – z/OSMF Software Management UUID

What it is:
- Knowing what SMP/E CSI “covers” a specific active z/OS system hasn’t been possible, at least in any official programmatic way. This information is always just known by the z/OS System Programmer, usually by having naming standards. Making it not automatic, and so not robust.
- In z/OS 3.1, we now have the capability to correlate a UUID with a running z/OS system, which can then programmatically retrieve the SMP/E CSI which represents the running system when used as directed.
UUID details:
- Universally Unique Identifier. Sometimes known as Globally Unique Identifier (GUID)
- A long string of hex digits, separated by dashes, which is actually a 128-bit label. Usually in 8-4-4-4-12 format.
- Always unique by design. Doesn’t use a central registration authority and no coordination between parties.
Requirement to use this capability:
- This function is limited to the z/OS operating system Software Instance only. Separately deployed program products and middleware are not applicable. Only for z/OS because we know an LPAR or VM guest can only have one operating system.
- You must have an SMP/E CSI that accurately reflects your z/OS system in the first place. If you have no SMP/E CSI that is specific to that running z/OS system, this capability is not applicable. We’ve always strongly recommended that you deploy z/OS with its own CSI so that you always have an accurate CSI that represents what was deployed!
- You must install a provided usermod during deployment, which contains the UUID. We’ll provide the SMP/E usermod and UUID when using z/OSMF Software Management, with the PostDeploy Workflow. That usermod leads to UUID being in LPA.
Some practical things:
- Re-deployment, with a different CSI would mean the UUID must be updated. For example, from Test into Production. Otherwise we have a “ringer”.
- You must be using z/OSMF Software Management to generate the Software Instance UUID. z/OS 3.1 z/OSMF Software Management will go through your inventory and automatically assign UUIDs.
- z/OSMF Software Management keeps track of the UUID-Software Instance, which then gives us the CSI(s).
Value of using the new function:
- For any programmatic usage to find out what is installed on the running system, with confidence. REST API used to retrieve UUID as part of a JSON response. Also displayable with a D IPLINFO command.
- Use the UUID in z/OSMF Software Management queries. These REST APIs return JSON, which is widely understood, and able to used by popular modern programming languages such as python, node.js, PHP, Go, PERL, etc.
- Ties very nicely by finding active z/OS system information in a software inventory that has a lot of inactive software. It’s a solution to an age old problem.

Performance – Engineering Part Umpteen – Logical Processor Home Addresses

Marna and Martin likely already talked about home addresses and SMF 99 Subtype 14. This is a continuation of that discussion.
Logical Processor Home Addresses are the LPAR’s preferred physical location for a specific logical processor to be dispatched. This would be the drawer, DCM, chip, core. Also degrees of meaning are for VH, VM, VL, Dedicated.
With z16 support it’s now in SMF 70-1, cut by Data Gatherer which everybody has.
Better in some ways than SMF 99-14. In that is it likely collected by all installations, collected for all LPARs, including non-z/OS and all z/OS.
It is useful because:
It contains an analysis of the effect of PR/SM Weights & HiperDispatch, drawers, DCMs, chips. SMF 113 is good for z/OS cache effects, etc.
It can verify the location of ICF and IFL LPARs in the top drawer, which are separated from z/OS, but sometimes impossible to usefully achieve. IFLs include IDAA and VM. Concurrent Drawer Maintenance complicates things. It’s difficult to predict what PR/SM will do. LPAR Design, though, is important.
Keep in mind a few health warnings:
- Logical processors not always dispatched on their home addresses. True of VM and VL logical processors. VH and Dedicated WILL always be dispatched on home processor.
- Location of memory is not included. SMF 113 gives hints, sort of.
- z/OS as a VM guest not supported. SMF 70-1 doesn’t report MVS guests under VM, just the LPAR. It does flag “this MVS is in a virtual machine”.
Provided in z16 Exploitation support, with OA62064, and is in the z16 Exploitation Support and SMF SMP/E FIXCAT.
Also in SMF 74-4 CPU Section:
- One for each Coupling Facility logical processor
- Completes the picture by addressing External CFs’ logical processors
You might need SMF 99-14 in addition:
- SMF 99-14 has affinity nodes, whereas SMF 70-1 doesn’t. In practice not much you can do about affinity nodes.
All in all a very useful advancement in the instrumentation, and now available for many of you. Martin has basic code to process this – so it is already featuring in engagements.

Topics – Some Useful Tips For Debugging An DFSORT E15 Exit

E15, E35, E32 are popular DFSORT exits, which enhance DFSORT processing. Martin uses E15 to flatten records and enhance filtering.
It is specified in control statement – OPTION MODS. Usually written in Assembler, can be COBOL or PL/I, though. Mostly pre-existing, so E15’s might need maintenance.
- Example: Flattening SMF records
- Example: Unpacking JSON or XML
Anyone trying to add function to DFSORT, or anyone trying to maintain DFSORT E15 exits might need some advice. Martin uses them as DFSORT can do the record I/O. It is fast, no messy Assembler, and the flattening and filtering is powerful.
Martin’s tips are:
- Use exits to do the I/O for you
- Do A COPY First
- Stop After 1 Record To Begin With
- Write Diagnostic Data Into Output Records
- Forcing An Abend Can Help
- Code A SYSUDUMP DD
- Maintain A DFSORT Symbols Convention
- GETMAIN Your Working Storage
- Write To The SPOOL
In conclusion:
- As a sysprog you might not have come across E15 exits but they’re valuable
- Almost all the techniques would work with COBOL or PL/I

Out and about

Marna will be at SHARE Orlando. March 3-7.
Martin is working on lots of customer situations, destinations to be revealed at a later date

On the blog

Marna has published no blogs since the last podcast episode.
Martin has published these blogs since the last podcast episode:

So It Goes

Reduced To A Single Tap

(This is not a post about plumbing.) 😀

It’s been a while since I last wrote about personal automation. And in Stickiness I talked about what makes automations stick for me.

This post is about experiments with RFID detection and automation. These actually turned into something I use daily when I’m at home.

Hobbyist Digital Electronics

Let me digress a little. When I was young I learnt all about Kirchoff’s Laws and other aspects of analogue electr(on)ics. But my real love was for digital electronics. I messed around with Zilog Z80 microprocessors and the various support chips. Indeed these were the core of part of my Masters in Information Technology. And this is where I learnt my first assembly language – but not my last.

But then the rest of my life took over. 😕😀

Digital Electronics In The 2020’s

A few years ago I got into Raspberry Pi computers – mostly on the basis they were cheap Linux machines. This helped a bit with trying things out on Mac as well, with many of the utilities being essentially the same. And I did a little electronics – but not much.

But then last Xmas I acquired a Pi Hut Maker Advent Calendar. As the name suggests, it contains 12 experiments, which mostly build on each other.

At its heart is a Raspberry Pi Pico – which has a runtime rather than a Linux operating system. You program it from a real computer, which might be a Raspberry Pi. I’ve done it with a Pi but generally use one of my Macs.

The runtime interprets one of two flavours of Python: MicroPython and Circuit Python. Generally I use the former – as that is what the Advent Calendar introduced me to – but there have been Pico-based devices where Circuit Python is easier.

Pico W Is A Game Changer

The Pico comes in two flavours:

The Original Pico
The Pico W

The latter has Wi-Fi (and Bluetooth). Wi-Fi opens up lots of Automation opportunities.

While you could use the Pico to automate via a USB connection, Wi-Fi is much more flexible.

I have automations using two mechanisms that require REST interactions:

On the Mac with Keyboard Maestro.
On an iOS device with PushCut.

As Mac and iOS devices have different capabilities it made sense to learn how to work with both.

A Single Tap?

One of the devices a Pico – of either variety – can drive is a RFID reader.

It’s not difficult to write MicroPython code to handle card taps on the reader. Essentially you’re in a wait loop until a tap is registered. Then you “do the thing”.

My code uses the identifier embedded in the card to index into an array. Based on that a specific automation is kicked off – via a URL scheme. (Either Keyboard Maestro’s or Pushcuts’s.)

While a card can carry more information they all have a 4-byte identifier. Which is how uniqueness is supported.

My code writes a message to the (Thonny) console if the card has an identifier that is not handled. That way I can add the card to the array, along with an automation routine.

Experiments With RFID Cards

While RFID readers often come with visibly blank cards and keyrings, I’ve experimented with other RFID cards (or what I thought were):

Supposedly many credit cards have RFID built in. I wouldn’t recommend using these as some idiot might run off with them – even if you stick to expired ones.
We went on a cruise recently – and the on board passes got harvested as RFID cards. (They were no use for anything else after the cruise, except perhaps for nostalgia.)
I discovered my Gautrain card (cancelled since I hadn’t been to South Africa in years) also works.
I wondered how a Philips Sonicare toothbrush knew if you hadn’t changed the head. My surmise was it has an RFID reader built in and recognises the same old 4-byte code – until you change the head. When I changed the head I confirmed the old one had a RFID tag in. So I chopped the brush bit off and what remains is a workable test device – for “alien card” logic.
Some hotel room cards also work.
Some cards surprised me by not being RFID cards. Most notably the Oyster Card (used for getting round London).

Shrink To Fit

So there I am with my breadboard with a Pico W, 3 coloured LEDs, a resistor, and the RFID card reader hanging over the edge of it. This is clearly a fragile thing – and not at all portable.

The first thing to do was to replace the coloured LEDs with a RGB LED. This has 4 pins – Red, Green, Blue, Neutral. It’s much more compact than 3 LEDs. It’s easy to program.

The next thing I did was to find a plastic box to encase the circuitry in. This turned out to be a square makeup bud box. There are plenty of these I’ve harvested over the years. The plastic is quite soft so it was easy to cut a slot for the RFID reader’s wires and a small hole for the RGB LED. The RFID reader is stuck to the lid.

It won’t win any design awards but it gets the job done; It’s much more stable than the previous (breadboard-only) implementation. And here it is:

One small snag: Standard length (10cm) DuPont Wires are so long it was hard to shut the lid; It kept springing open. I looked for shorter (5cm) ones – in vain. So I decided to make my own. You can get a crimping tool and a set of wires and connector parts. I have to say this is fiddly in the extreme – especially with my old eyes. They say “practice makes perfect”. Well, it took a lot of practice. But finally the lid fits. I’ll probably have to solder the wires to the RGB LED. And then it might pass a shake test and I can consider it portable.

Conclusion

So this has been quite an adventure – through componentry and MicroPython programming and connector making. But I have something I use at least twice a day – “Start Day” and “End Day” being 2 cards that’ve ended up “In Production”. And 2 others kick off OmniFocus task creation and Drafts document creation.

Making Of

I realise USAns and probably others use the term “faucet” instead of “tap” – so the opening joke falls a bit apartment for them. Oops, I did it again. 😀

I started drafting this in Drafts on my iPhone – while running errands around London. And finished it on a flight to Istanbul.

Actually most of my posts are written in Drafts and then converted to HTML in Sublime Text, before being published via WordPress. This workflow works well for me, particularly as I can generate text (Markdown) anywhere in Drafts.

Signal is, of course, spotty on the underground- so link research was a bit fitful. A fortiori up in the air – where I’m not (yet) inclined to pay for WiFi.

Bursty Batch – Small Reprise

In Bursty Batch I talked about how some customers have large amounts of batch work coming in all at once, and how a new WLM function in z/OS 3.1 might be handy in catering for it. And it subsequently occurred to me there is a cheap-to-collect and therefore almost universal method of assessing how bursty batch is. This post is about that method.

One section in SMF 70-1 is the ASID Data Area Section. It has system-level statistics for such things as the number of Started Tasks or TSO userids.

To take a relevant example, you can calculate the average number of batch address spaces by dividing field SMF70BTT by SMF70SAM.

So SMF70SAM is the number of RMF SMF 70 samples in an interval. Which makes SMF70BTT the total of all the sampled batch address spaces in the system. Hence the average. Samples in this context are one per second. So a typical 15-minute interval has 900 samples. We’ll come back to samples in a moment.

An average over 15 minutes is not a great determinant of burstiness. A lot can happen in that time. While one might drop the RMF interval to 5 minutes or even 1, most customers don’t run that way; The volume of RMF SMF records goes way up the shorter the interval. So this sort of interval length is good for benchmarks or Proofs Of Concept (POC’s) – of which I have data from one at the moment.

If an average is not good, it would be nice to compute a maximum. And this is where the neighbouring field to SMF70BTT comes in: SMF70BMM. This field is described in the SMF manual as the maximum number of batch address spaces. Actually, as happens occasionally, this description doesn’t entirely cover the ground. Let me explain why.

I said I’d return to samples, and here we are: The number of batch address spaces is sampled, once for each sample point. It is the maximum of these sampled values that SMF70BMM contains. But why do I make this point? It’s because the sampling process doesn’t rule out there being times – between the sample points – where the value was higher. So SMF70BMM isn’t a perfect measure. If you want perfect measurements you have to spend a lot more resources getting them.

But is SMF70BMM good enough? Take a look at the following graph.

Here I’ve plotted the maximum number of batch address spaces in the interval, and the average. This is real customer data, in case you wondered.

During the day the average number of batch jobs remains pretty constant, while the maximum varies wildly. You might discern an hourly pattern, with minor peaks on the half hour. This is interesting as it suggests work is thrown in on a timer pop of some sort. You’d have to examine SMF 30 to learn more about this.
At night there is more variation in the average, and much more in the maximum. The system peaks at over 60 jobs – according to SMF70BMM. Of course, this is a lower bound, but the picture is pretty clear.

From this pair of metrics we can learn a lot about the nature of batch in this system.

One thing we can’t learn that much about is balance between systems. The averages won’t show the fluctuations and the maxima can’t really be compared – as they might not coincide. In this case the average is the better of the two.

So, I think the SMF70BMM approach is valuable. It’s possible the other maxima – for, say, TSO or Started Tasks – is valuable. But I’d think rather less so.

The Making Of

Again I’m writing this on an aeroplane. It’s an Airbus A380 – in British Airways Economy. I make that point because, surprisingly to me, the seat pitch is adequate for a 12.9” iPad Pro.

And, if you were wondering about the title, in my head I misattributed it to Jimi Hendrix. In fact Queen and Paul Rodgers had a song called “Small”. And at the end of the”Cosmos Rocks” they had a song called “Small Reprise’. (You might prefer, though, Roger Taylor’s own “Small”.

I thought originally this post would indeed be a small reprise. In fact it’s quite lengthy. Oh well.

And the smudge on the graph is, of course, obfuscation.

In My Estimation

This post is about Coupling Facility sizing – particularly when you don’t have one to start with. And particularly CPU. (Memory is reasonably catered for with CFSizer – whether over the web or now in z/OSMF for z/OS 3.1.)

And the reason I’m writing about this is because I was recently asked to help size in just such a set of circumstances.

Narrowing The Doubt

Coupling Facility CPU usage is so variable that one is tempted to say “I’ve no idea” – but that isn’t a very satisfactory answer. So let’s see if we can do better. This is what I call “narrowing the doubt”.

When I was young the Country Capacity Planning Systems Engineer was reputed to be able to size a machine from the industry the customer was in and the number of employees. Those – late 1980’s – were simpler times. I would consider this the widest possible doubt short of “I’ve no idea”.
Narrower might be to see what other customers of a similar size have configured, along with how well it worked for them, as well as something about the workload.
Narrower still, perhaps, might be some guesses at request rates and service or Coupling Facility CPU times. We can establish reasonable numbers for the latter. Don’t quote me but 3 – 5μs for a lock structure and 10 – 20μs for a cache structure might be reasonable. There are two immediate problems with this:
- These estimates are quite wide-ranging.
- We don’t know the request rates.
Benchmarking can narrow the doubt further. But that’s a luxury few sites have available to them. Further, it might not reflect reality too closely.
Without benchmarking, or even with, a cautious approach to implementation is indicated. In this recent case there is a roughly 20&percnt; / 40&percnt; / 40&percnt; split. It makes sense to implement the 20&percnt; first, then one of the 40&percnt; ones, then the other. There are a couple of problems with this:
- It might not be possible to implement this way.
- The first or second portions might not be representative of the whole.

When it comes to “narrowing the doubt” it is as well to understand how wide the residual doubt actually is. If it remains – in your opinion – very wide you have to call that out. In a recent processor sizing situation I did just that. It might sound like defeatism but calling it out early allows people to plan for if the estimate is lower than the reality. In that case part of the reason for selecting a z16 A02 over a z14 ZR1 was the upgradability – in late 2023 – of the z16.

And the topic this post addresses has a lot of doubt. But I’ve tried to outline techniques for narrowing it. Of course there might be others.

Instrumentation

What I haven’t done so far is to describe the instrumentation that helps assess Coupling Facility CPU cost. There are two levels of this, both from SMF 74-4:

At the Coupling Facility level fields R744PBSY and R744PWAI can be used to compute CPU busy. This – for shared Coupling Facilities – might need to be augmented with SMF 70-1.
At the structure level field R744SETM gives you the CPU used in the Coupling Facility not the coupled z/OS. You have to sum up all the request rates from all the systems accessing the structure, whether synchronously or asynchronously. Then you can divide the R744SETM by this sum to compute a CPU-per-request number. The actual fields are too numerous to mention here.

But obviously, without an actual Parallel Sysplex or Datasharing (or whatever) environment there’s nothing to measure.

Conclusion

I should point out that you’d not want to run a Coupling Facility above, say, 50&percnt; busy. Pragmatically, you need to understand recovery scenarios – especially “white space”.

Further, you’d want to understand how structures scale with request rates. Tough to do if you don’t have any structures to start with.

The Making Of

This is one of a pair of blog posts drafted on the plane to Johannesburg. It did, however, get the benefit of several “sleep on its”, particularly the instrumentation section and the conclusion.

Bursty Batch

Bursty batch is quite common. For example, a customer I’m dealing with right now kicks off a burst of batch at 7PM and another burst at 10PM. I doubt that customer is reading this blog post. Another customer has a burst of batch kicking off at 2AM. They probably will read this post. But their operational security is assured: This is quite common. 😃

It’s worthwhile thinking about how this comes to be:

In the abovementioned cases there are business reasons for the release of batch at specific times. In their case instructions from external actors.
The ending for the day of a CICS service is another example – which might be a bounce to let batch run and then pick up new files.
Some prerequisite operation completes.
Some arbitrary definition of when the batch starts.

In any case a lot of work suddenly can run. But should it?

The temptation is to let it all in. Possibly motivated by the necessity to make it run as quickly as possible. But this is not consequence free: It can lead to thrashing.

CPI As An Indicator Of Thrashing

If we throw too much work in at once you might expect thrashing of CPU elements, such as the cache hierarchy.

This, for one, can lead to a typical instruction taking longer. I hope it’s obvious to you that cache misses cause CPU cycles while the data is fetched. Even cache hits serviced from another drawer can take a few hundred cycles. These are wasted cycles. Now, whether this leads to elongated run times is another matter. Suffice it to say an increase in CPU time for a job makes it more prone to queueing – which can lead to even more cache-related wasted cycles.

Wasted cycles might have a financial impact. With older software licencing schemes, based around the peak rolling four hour average GCP CPU, it’s quite common to see the batch driving the cost. And quite often soft capping is involved – which stands to elongate things further.

SMF 113 includes two useful counters – at the logical processor level: Instructions Executed and Cycles While Executing Instructions. These are in the Basic Counter Set and have been there since z10 (i.e. the beginning). So you certainly can perform the calculation: Cycles Per Instruction (CPI) is Cycles While Executing Instructions divided by Instructions Executed.

(Don’t quote me but) I’m seeing CPI typically in the 2 to 4 range. I say “don’t quote me” because it depends on a lot of things, including processor generation but also LPAR design and workload. In all the customers I’ve ever seen there’s been a daily cycle (pardon the pun) that CPI is observed to follow.

By the way, if the LPAR gets busy it might cause unparking of Vertical Low (VL) logical processors- and work running on those will almost certainly exhibit a higher CPI than Vertical High (VH) and Vertical Medium (VM) logical processors. Bursty work could well do that. Which sometime explains why I see spikes in CPI, usually at the same time each day.

SMF 113 is typically recorded on the 30-minute SMF interval. You’d think that is far too long to capture bursty batch. But note:

Severe burstiness would “move the needle” – even if there were, say, 15 minutes of it. Conversely, you might consider it not severe if there was little trace of it.
If you see – in SMF 113 – a spike in CPI you can bet the actual spike was much worse.

I wouldn’t recommend you drop the SMF interval, hoping to capture such things better. That’s the sort of thing you leave to SMF 98.

But CPI is not the only indicator. You might see lots of other evidence, such as:

CPU Queuing, or zIIP-on-CP. This would be at the service class period level – in SMF 72-3.
Locking, buffer pool misses, etc in Db2 Accounting Trace (SMF 101).
Unexplained variations in job and step elapsed time.
Initiation delays – in SMF 30 and 72-3. We’ll come back to this one.

Of course, this isn’t an exhaustive list.

Is WLM Too Slow?

We don’t want WLM to be in “nervous kitten mode”. Namely overreactive. On the other hand we don’t want it to be underreactive, either.

We want WLM to make the right decisions, with the right data, in a timely fashion.

The latter is the sticking point; WLM operates in a matter of seconds, but each change is only going to add a few initiators. This is a “smoothed response” – which is generally better than “nervous kitten”.

So an onrush of submitted batch can lead to initiator delays.

You could dispense with WLM-managed initiators altogether – and hope to manually get it right. And you could have an excess of initiators and watch your batch thrash.

Fortunately there is (soon going to be) another way. Read on.

z/OS 3.1 WLM AI Initiators

This new Artificial Intelligence (AI) function observes your batch and predicts when the work will spike. Before the spike it will nudge WLM towards adding initiators.

I rather like this function and the word “nudge” is doing the heavy lifting here: The AI adds Initiator Delay samples (R723CTDQ in RMF SMF 72-3). This happens ahead of the predicted spike. But the samples are only one factor in WLM’s decision to add more initiators. System conditions have to be taken into account, such as GCP and (as of z/OS 2.5) zIIP.

This design looks good because it minimises the risk of over-initiation causing thrashing. And it tells something WLM has no other way of knowing: When work is coming over the horizon. Such as our 7PM, 10PM, and 2AM spikes.

Fairly obviously, I hope, the work has to be broadly predictable. If there’s a sudden burst of work that is “out of phase” you can’t expect the AI to spot that.

WLM Knows Best – Or Does It?

WLM gets it’s information in a number of categories:

Classification rules
Goals in the Service Definition / Active Policy
Sampled workload attainment
System conditions

And now another:

The last ones are automatic (with AI only being there if you set it up). The first two are worth talking about:

You need to make sure the right batch is classified to the right service class. For example TWS (or OPC to us old folks 😃) can place late-running work on the critical path in a (supposed) Critical Batch service class. But many installations are doing this manually.
The batch goals need to be right – both period durations and goal values.

A note on the word “supposed”: TWS will assign such work to a specific service class name. It’s up to you to make sure that really is an appropriate service class. And much of that is to do with the other point: Decent goals.

Parting Shorts

Well, that was a long post. I wanted to get two concepts across:

Over-initiation can cause thrashing and SMF 113 and 72-3 can illuminate that.
z/OS has a nice new (optional) function that can help with delayed initiation.

Some other parting sho(r)ts:

It’s ever more important to get WLM classification and goal setting right.
Consider the value and possibility of feeding in more judiciously. Your batch might even perform better.
When thinking about whether z/OS 3.1 WLM AI Initiators will eventually be able to help you, plan for the z/OS AI foundation work to enable it and any other Systems Management capabilities that might come along. It’s not trivial but it’s not perversely difficult either.

The Making Of

This post was written on a flight to Istanbul – and tidied up on the flight back. The purpose of the trip is to present z/OS 3.1 to a bunch of Turkish customers. And I met with a few of them. It’s been too long since I was last here – and we all know why. 😕 So I was very pleased to meet them again – and this very topic came up in each call. I suppose there’s a shiny new thing to talk about so inevitably it will come up. But, one shouldn’t be in a “hammer looking for a nail” situation.

I won’t claim any errors or insults are the result of cramped conditions – of course.

Seriously, a longish flight gives me time to think and write.

Tips For Debugging A DFSORT E15 Exit

I suppose I’d better tell you what an E15 exit is – else you might not read the rest of the post. 😀

DFSORT (and its competitor) allow you to send records to an exit routine. This happens as the very first thing for processing an individual record. This routine is called an “E15 routine”. There are two other, similar, exit points that happen later in record processing – “E35” and “E32”.

These tend to be written in assembler, though they could be written in COBOL. My personal use cases are satisfied by the former.

But why write an exit routine at all? There are several reasons. You might want:

Record selection criteria that are diffe rent from what DFSORT can offer. For example, based on a field in a variable location.
To extract fields that aren’t in fixed positions.
Multiple records created from one input record (even if you coalesce them later).
To format fields in a particular way that isn’t doable or easy with DFSORT.

The first two are quite similar, of course. And the first three are the main reasons I write E15 exits. Though I have used Reason 4 – to convert timestamps into multiple fields.

Now, how does the above apply to SMF? SMF, of course, is my prime data source. SMF records consist of sections, addressed by triplets. The triplet mechanism allows for variable numbers of sections of a given type.

The SMF format very often leads to fields in variable positions and the need to break a record into groups for further processing.

So, to do all this, I write assembler E15 exit routines.

In fact this is almost the only time I write assembler – so I need all the help I can get. 😃

Here’s another source of such help:

But let’s look at this another way: If I write an assembler exit routine and wrap it in DFSORT I get my I/O done for free. No more mucking with BSAM, QSAM or VSAM. Plus I get other “slice and dice” for free. So I’m highly likely to write the bulk of my assembler code as a DFSORT exit routine.

Some Useful Tips

As the title suggests, this post is about debugging techniques so here are some. They’re things I actually used in my most recent debugging session. I think they’re useful.

Do A COPY First

Build up your DFSORT application in stages, starting with a COPY:

       OPTION COPY

This actually overrides eg SORT.

Once you’ve got the E15 exit working with COPY you can add in other elements, such as SORT, SUM, OUTFIL. Actually it’s as well to get the exit working with COPY before you add in INREC as well.

In general start at the beginning – the E15 exit – and work your way forwards, adding statements and refining them.

Stop After 1 Record To Begin With

It’s useful and quick to run with only 1 record being produced. In particular to make sure you can write a basic record.

       OPTION STOPAFT=1

You can always write a small number of records – and this is diagnostically different from writing just 1:

       OPTION STOPAFT=nnnn

I say that because the ability to loop over eg SMF sections in the input record isn’t trivial.

If you have a troublesome input record you might be able to avoid processing it – for now – with a combination of STOPAFT and SKIPREC:

       OPTION SKIPREC=nnnn,STOPAFT=nnnn+1

Actually, this isn’t one I had to use this time but I have in the past.

Write Diagnostic Data Into Output Records

You can write anything you like into the record the E15 exit routine passes back to DFSORT. I, for example, wrote some register values into the record my code passed back. I did, of course, delete that debugging code once I’d got over the problem I was trying to solve.

It needn’t be registers, of course. It could be contents of storage areas.

Forcing An Abend Can Help

If you want to see the state of play at any point you can force an ABEND. Coding

       ABEND 1

will get you an ABEND. But see below.

Code A SYSUDUMP DD

If you don’t you’ll get a SORTSNAP dump if the exit routine ABENDs which is rather short and doesn’t contain the input record nor any reformatted one, nor any other storage areas you might’ve GETMAINed.

If you do code a SYSUDUMP DD you’ll get a full dump. This is nice because:

Doing a find for “RTM2WA” will get you to the registers at the time of the ABEND.
You can see the address of the failing instruction and its offset into the exit load module.
You can navigate to storage areas, such as the input record and any reformatted output record.

Maintain A DFSORT Symbols Convention

Always code a SYMNAMES input DD and SYMNOUT output DD.

And here’s the convention I’ve used:

Map the input record using symbols that don’t start with an underscore.
If you have an INREC then map the record that results from it with symbols that start with an underscore.
If you additionally have an OUTREC then use a double underscore for the results of that.

And so on.

GETMAIN Your Working Storage

If you use working storage then GETMAIN it and hang the address off the user exit constant.

Storing into the instruction stream is not performant and this technique minimises that.

Of course you can use DSECTs to map such storage areas – as I do.

Write To The SPOOL

Early on write the output data (probably SORTOUT DD at this stage) to the SPOOL. But don’t flood the SPOOL so restrict this to when you’re e.g. using STOPAFT.

This is a minor hint but it saves you flipping between ISPF 3.4 and SDSF to check both output aspects of the run:

The output data
Messages and Symbols information

Conclusion

That’s quite a kitbag of techniques. I will say that many of them have nothing to do with E15 exits or assembler; They make sense when developing any DFSORT application.

Behind The Scenes

If you’re going to write a blog post about debugging it’s advisable to do it as close to when you learnt the tips as possible. In fact most of the material for this post came from a mammoth debugging session this week – for SMF Type 74 Subtype 4. Still better would’ve been to have written it as I debugged – but the idea for the post emerged only during the session.

Then, a few days later, I decided it’d be a good idea to explain why you’d even want to write an E15 exit. It’s not good enough to say “if you know you know”.

Then again, another debugging session for a different SMF record (Type 30) a week later yielded a different tip.

And a pickiness point: I’ve tried to use the terms “exit point” and “exit routine” correctly. Generally one just says “exit” for both but I think that less clear.