I just produced a new chart, which I think is worth sharing with you.
I produced it for one specific use case, but I think it has slightly wider applicability.
The Trouble With Outsourcers
Quite often we get performance data from outsourcers, whether IBM or some other organisation.
Generally they’re running LPARs from more than one of their customers on the one machine.
We have a nice chart for a machine. It shows all the LPARs’ CPU stacked up – with each LPAR a different series.
This is fine in a one-company context.
But sometimes we are working with the outsourcer and one of their customers.
We wouldn’t want to show them the outsourcer’s other customers’ LPARs.
But we would want to show them how busy the machine is.
It’s reasonable to show them how busy the machine is because, of course, it affects the performance they’re seeing.
And we might well get into LPAR design issues.
(A tricky one is the weights because adjusting them is a “robbing Peter to pay Paul” situation – and with a multi customer machine that’s obviously political.)
So here is a new chart, that neatly solves the problem.
It’s a real one, though there has had to be a little obfuscation of the names.
In this case CPU2 is a Production LPAR and CPU3 is a Development LPAR.
The grey is all the other LPARs’ use of the GCP pool.
It’s clearly substantial.
The pool itself isn’t hugely busy – but then this was not said to be a problem day.
But There’s More
Even in the one-company case this chart is useful.
Suppose a customer sends us data from what they consider their biggest LPARs.
It would be good to show:
The LPARs they sent us data for are indeed the bulk of the CPU.
We’re missing a big trick as the LPARs they sent data for don’t use the bulk of the CPU.
One Final Plea
I’ve said this many times, but probably not written it in a blog post.
Always report processor pools separately.
Everything in this post has been for a single machine’s GCP pool.
To mix GCP’s with, say, zIIPs makes no sense at all.
But there’s something I thought I’d written about before, but I hadn’t: With z14, IOPs are always enabled for SMT. Actually one of them isn’t, but the rest are. So, in SMF 78–3 you get an odd number of IOPs – and therefore an odd number of IOP Initiative Queue and Utilization Data Sections. One is not SMT-enabled and the rest are.
So, if you have 10 IOP cores you have 19 IOP sections.
It would be interesting to see how they behave. So I took data from a two-drawer z14. (It’s a M02 hardware model, with a software designation 507, with 7 GCPs, 4 zIIPs, and 5 ICFs. It has lots of LPARs.)
So, I used the 78–3 data to plot two metrics:
Processed I/O Interrupts per second
IOP Busy %
Here is the graph, with IOP Busy on the right-hand axis and I/O Interrupts on the left.
The numbers are interesting but there is no clear pattern:
The I/O Interrupt rate varies wildly – and I suspect it has something to do with the devices and channels the IOP is handling.
The IOP Busy % doesn’t necessarily correspond to the I/O Interrupt rate.
Probably the more important and useful metric is the IOP Busy number.
When I say “no clear pattern” I mean it would be difficult to say something like “IOP 4 is busier because of its position in the machine”.
I do think it’s worth keeping an eye on IOP Busy %. This particular set of data shows very low IOP utilisations – which is a go thing.
For a 2-drawer z14, 10 IOPs is the standard number but you can buy more. For z13 it was 12 and for z15 it’s 8. there’s a clear trend here. I do think that having SMT as standard on IOPs will have contributed to the possibility of reducing the number of standard IOPs. Obviously them getting a little bit faster with each generation helps, but you have to balance that against other processor types also getting faster. Another factor might be the historical trend towards more memory in a machine and fewer I/Os, relatively speaking.
My code knows that it’s standard for a 2-drawer z14 to have 10 IOPs. It has to calculate – especially from z14 onwards – the number of IOPs as this isn’t recorded. SMT is part of that calculation. So I report standard IOPs and additional IOPs – though I haven’t seen a case of the latter yet.
And this is in the “Engineering” series of blog posts as we’re dealing with individual processors, even if they are IOPs.
To recap a little, the premise was very simple: I wanted to create a tool that could automate colouring nodes in a mind map, based on simple filtering rules. The format I chose was iThoughts’CSV file format. (It could both import and export in this format.) Hence the name “filterCSV”.
I chose that format for three reasons:
I use iThoughts a lot – and colouring nodes that match patterns is a common thing for me to do.
The format is a rather nice text format, with lots of semantics available.
Python has good tools for ingesting and emitting CSV files. Likewise for processing arrays – which is, obviously, what CSV can be converted to.
So I built filterCSV and last time I wrote about it I had extended the CSV -> filterCSV -> CSV cycle to
Ingest flat text, Markdown and XML
Emit HTML, OPML, XML, Freemind
So, it had become a slightly more general tree manipulator and converter.
What Happened Next?
I’ve done a lot of work on filterCSV. I’ll attempt to break it down into categories.
You can now import OPML.
You can now export as tab- or space-indented text.
You can now export in GraphViz Directed Graph format, which means you can get a tree as a picture, outside of a mind-mapping application.
Tree Manipulation Functions
You can sort a node’s children ascending, and you can reverse their order. The latter means you can sort them descending. Imagine a tree with a Db2 subsystem and the CICS regions that attach to it as its children. You’d want the CICS regions sorted, I think. (Possibly by name, possibly by Service Class or Report Class.
Sometimes it makes sense for the children of a node to be merged into the node. Now they can be and they are each preceded by an asterisk – to form a Markdown bulleted list item. (iThoughts can handle some Markdown in its nodes.) I think we might use this in our podcast show notes.
You can now select nodes by their level in the tree. You can also use none as a selector – to deselect all nodes in the tree. (Before you had all as a selector – to allow you to set attributes of all nodes.) You might use none with nc (next colour) to skip a colour in the iThoughts palette.
Here’s an example:
Where the first command says ‘for nodes whose text is “A1” colour with the first colour in the standard iThoughts colour palette’. The second says ‘do not use the second colour in the palette’. The third command says ‘for nodes with “A1A” in their text use the third colour in the palette’.
New Node Attributes
iThoughts, as well as colour and shape, has three attributes of a node that filterCSV now supports:
Icon – where you can prefix a node with one of about 100 icons. For example, ticks and single-digit number icons.
Progress – which is the percent complete that a task is. Some people use iThoughts for task management.
Priority – which can range from 1 to 5.
As with colour and shape, you can set these attributes for selected nodes, with the selection following a rule. And, again, you can combine them. For example a tick node and 100% completion. You can also reset them, for example with noprogress.
Invoking filterCSV with no commands produces some help. This help points to the Github repository and the readme.
You can now (through Stream 3) read commands from a file. If you do you can introduce comments with // . Those continue until the end of the line. You can also use blank lines.
I learnt to use Stream 3 for input you might invoke filterCSV with something like
So, you can see filterCSV (now at 1.10) has come on in leaps and bounds over the past few months. Most of the improvements were because I personally needed them, but one of them – indented output – was in response to a question from someone in a newsgroup.
And I’ve plenty more ideas of things I want to do with filterCSV. To reiterate, it’s an open source project so you could contribute. filterCSV is available from here.
And it’s interesting to me how the original concrete idea – colouring iThoughts nodes – has turned into the rather more abstract – ingesting trees and emitting them in various formats with lots of manipulations. I like this and probably should deploy the maxim “abstraction takes time and experience”.
In Episode 25 I said it had been a long time since we had recorded anything. That was true for Episode 25, but it certainly wasn’t true for Episode 26. What is true is that it’s taken us a long time from start to finish on this episode, and ever so much has happened along the way.
But we ploughed on and our reward is an Episode 26 whose contents I really like.
On to Episode 27!
Here are the unexpurgated show notes. (The ones in the podcast itself have a length limitation; I’m not sure Marna and I do, though.) 🙂
Episode 26 “Sounding Board”
Here are the show notes for Episode 26 “Sounding Board”. The show is called this because it relates to our
Topics topic, and because we recorded the episode partly in the Pougkeepsie recording studio where Martin sounded zen, and partly at home.
Where we have been
Marna has been in Fort Worth for SHARE back in February
Martin has been to Las Vegas for “Fast Start”, for technical sales training, and he got out into the desert to Valley Of Fire State Park
Then, in April he “visited” Nordics customers to talk about
zIIP Capacity and Performance
So You Don’t Think You’re An Architect?
But he didn’t get to go there for real. Because, of course, the world was upended by both Covid and Black Lives Matter.
Chapter markers, discussed in Episode 16. Marna finally found an Android app that shows them – Podcast Addict. Martin investigated that app, and noted it is available on iOS too.
When you run a workflow step that invokes a job you can automatically save the job output in a location of your choosing files (z/OS Unix file directory).
In the same format as you’d see in SDSF . Means users can have an automatic permanent record of the work that was done in a workflow
PTF Numbers are UI68359 for 2.3 and UI68360 for 2.4
APAR OA56774 (since 2.2) Provides new function to prevent a runaway sysplex application from monopolizing a disproportionate share of CF resources
This APAR has a dependency on CFLEVEL 24.
This case is pretty rare, but is important when you have it.
Not based on CF CPU consumption. Is based on deteriorating service times to other structures – which you could measure with SMF 74–4 Coupling Facility Activity data.
Mainframe – z15 FIXCATs
Important to cover as there are many questions about them.
Absolute minimum needed to run on a z15
Unfortunately some of these PTFs in that list have been involved in messy PE chains
If that happens, involve IBM Service (Bypass PE or ++APAR)
Usually intent is to keep these PTFs to a minimum – and keep the number of PTFs relatively constant.
CORRECTION: System Recovery Boost for z15 GA1 is in Required, not Exploitation category, as the recording states!
Needed for optional functions, and you can decide when you want to use them.
This PTF list could grow – if we add new functions
This is more confusing. Usually to fix a defect that is found but haven’t risen up to required. We might’ve detected it in testing, or a customer might have.
Over time this category probably will grow, as field experience increases
Might want to run an SMP/E REPORT MISSINGFIX to see what’s in this FIXCAT. Might install some, all, or none of the fixes. Might want to be more selective. Based on how much change you want to encounter, versus what problems are fixed
By the way there are other FIXCATs you might want to be interested in for z15, e.g. IBM.Function.SYSPLEXDataSharing
Performance – DFSORT And Large Memory
A very special guest joins us, Dave Betten, former DFSORT performance lead.
Follows on from Elpida’s item in Episode 10 “234U” in 2017, and continues the “Managing Large Memory” theme.
Number of things to track these days:
Often track Average Free
Also need to track Minimum Free
Fixed frames – Especially Db2, and now with z/OS 2.4 zCX
Large frames – Again Db2 but also Java Heap
In z/ 2.2
OPT controls simplified
Thresholds set to Auto
Default values changed
64GB versus %
In z/ 2.3
Not reserved anymore but is a maximum
BTW the LFAREA value is in SMF 71
Dave reminded us of what’s in SMF 71
Dave talked about DFSORT memory controls
DFSORT has historically been an aggressive user of memory
Installation defaults can be used to control that
But the EXPOLD parameter needs special care – because of what constitutes “old pages”, which aren’t actually unused.
Moved to iPad so he can edit anywhere, except where there is noise. Apple Pencil helps with precision.
Then, throw away remote side – in stereo terms.
Then, perform noise reduction, still not perfect.
Marna’s publishing: Uploading the audio, publishing show notes, still the same as before.
– Insert Usual Disclaimer Here – which is only our thoughts.
RFE 139477 “Please include the CPU Time Limit for a Job/Step in SMF Type 30”
The CPU Time Limit in effect for a JobStep is not currently written to SMF Type30 at the end of the step.
While a job is running this information is available in the Address Space Control Block (ASCBJSTL) and can be displayed or even modified by tools such as OMEGAMON.
However the information is not retained after the JobStep completes. This information would be very useful after the fact to see the CPU time limit in effect for a JobStep.
This enhancement request is to include the information in ASCBJSTL in the SMF Type30 Subtype 4 record written at the end of the JobStep.
An additional consideration would be how to best deal with the Job CPU time Limit (as specified on the JOB statement) and whether this can also be catered for in the RFE
Business justification: Our site got caught out by a Test job being submitted overnight with TIME=1440 and consuming over 6 hours CPU before it was cancelled. We would like to be able to prevent similar issues in future by having the CPU Time Limit data available in SMF.
After the fact
The RFE was calling for “after the fact”. i.e. when the step has ended. Might also like the source of the limit.
End of step looks useful. Could run query comparing to actual CPU time, then track to see if ABEND is on the horizon
“As it happens”
Would like on the SMF Interval as well as Step End records, maybe with tools to dynamically change the parameters.
May not need the SMF information if vendor and IBM tools already do it today, making it perhaps not a high enough priority for SMF
And the source of the parameters might not be readily available in control blocks so this might not even be feasible.
On the blog
Here are Martin’s many blog posts since the last episode. (We summarised in the podcast.)
I was wondering why my HiperDispatch calculations weren’t working. As usual, I started with the assumption my code was broken. My code consists of two main parts:
Code to build a database from the raw SMF.
Code to report against that database.
(When I say “my code” I usually say “I stand on the shoulders of giants” but after all these years I should probably take responsibility for it.) 🙂
Given that split the process of debugging is the following:
Check the reporting code is doing the right thing with what it finds in the database.
Check the database accurately captures what was in the SMF records.
Only when those two checks have passed should I suspect the data.
Building the database itself consists of two sub stages:
Building log tables from the raw records.
Summarising those log tables into summary tables. For example, averaging over an hour.
If there is an error in database build it is often incorrect summarisation.
In this case the database accurately reports what’s in the SMF data. So it’s the reality that’s wrong. 🙂
A Very Brief Summary Of HiperDispatch
Actually this is a small subset of what HiperDispatch is doing, sufficient for the point of this post.
With HiperDispatch the PR/SM weights for an LPAR are distributed unevenly (and I’m going to simplify to a single pool):
If the LPAR’s overall weight allows it, some number of logical processors receive “full engine” weights. These are called Vertical Highs (or VH’s for short). For small LPARs there could well be none of these.
The remainder of the LPAR’s weight is distributed over one or two Vertical Mediums (or VM’s for short).
Any remaining online logical processors receive no weight and are called Vertical Lows (or VL’s for short).
Enigma And Variations
It’s easy to calculate what a full engine’s weight for a pool is: Divide the sum of the LPARs’ weights for the pool by the number of shared physical processors. You would expect a VH logical processor to have precisely this weight.
But what could cause the result if this calculation to vary. Here the maths is simple but the real world behaviours are interesting:
The number of physical processors could vary. For example, On-Off Capacity On Demand could add processors and later take them away.
The total of the weights for the LPARs in the pool could vary.
The latter is what happened in this case: the customer deactivated two LPARs on a machine – to free up capacity for other LPARs to handle a workload surge. Later on they re-enabled the LPARs, IPLing them. I’m not 100% certain but it seems pretty clear to me that IPLing doesn’t cause the LPAR’s weights to come out of the equation; I’m pretty sure IPLing doesn’t affect the weights.
These were two very small LPARs with 2–3% of the overall pool’s weights each. But they caused the above calculation to yield varying results:
The “full engine” weight varied – decreasing when the LPARs were down and increasing when they were up.
There was some movement of logical processors between VH and VM categories.
The effects were small. Sometimes a larger effect is easier to debug than a smaller one. For one, it’s less likely to be a subtle rounding or accuracy error.
The conversion of VH’s to VM’s (and back) has a “real world” effect: A VH logical processor is always dispatched on the same physical processor. the same is not so true for a VM. While there is a strong preference for redispatch on the same physical, it’s not guaranteed. And this matters because the cache effectiveness is reduced when a logical processor moves to a different physical processor.
So, one recommendation ought to be: If you are going to deactivate an LPAR recalculate the weights for the remaining ones. Likewise, when activating, recalculate the weights. In reality this is more a “playbook” thing where activation and deactivation is automated, with weight adjustments built in to the automation. Having said that, this is a “counsel of perfection” as not all scenarios can be predicted in advance.
What I Learnt And What I Need To Do
As for my code, it contains a mixture of static reports and dynamic ones. The latter are essentially graphs or the makings of – such as CSV files.
Assuming I’ve done my job right – and I do take great care over this – the dynamic reports can handle changes through time. So no problem there.
What’s more difficult is the static reporting. So, one of my key reports is a shift-level view of the LPAR layout of a machine. In the example I’ve given, it had a hard time getting it right. For example, the weights for individual LPARs’ VH processors go wrong. (The weight of a full processor worked in this case – but only because the total pool weight and number of physical engines didn’t change. Which isn’t always the case.)
To improve the static reporting I could report ranges of values – but that gets hard to consume and, besides, just tells you things vary but not when and how. The answer lies somewhere in the region of knowing when the static report is wrong and then turning to a dynamic view.
In particular, I need to augment my pool-level time-of-day graphs with a stack of the LPARs’ weights. This would help in at least two ways:
It would show when weights were adjusted – perhaps shifting from one LPAR to another.
It would show when LPARs were activated and de-activated.
A design consideration is whether the weights should stack up to 100%. I’ve come to the conclusion they shouldn’t – so I can see when the overall pool’s weight changes. That reveals more structure – and I’m all for not throwing away structure.
Here’s what such a graph might look like:
In this spreadsheet-driven mockup I’ve ensured the “now you see them now you don’t” LPARs are at the top of the stack.
I don’t know when I will get to this in Production code. As now is a particularly busy time with customer studies I probably should add it to my to-do list. But I’ll probably do it now anyway… 🙂
Head Scratching Time
In this set of data there was another phenomenon that confused me.
One LPAR had twelve GCPs online. In some intervals something slightly odd was happening. Here’s an example, from a single interval:
Logical Processors 7 to 11 had polar weights of 0.
If you tot up the polar weights you get 410 – which checks out as it’s the LPAR’s weight in the GCP pool (obtained from other fields in the SMF 70 record).
Obviously Logical Processors 0, 1, 2, 3, and 4 are Vertical High (VH) processors – and bits 0,1 of SMF70POF are indeed “11”.
But that leaves two logical processors – 5 and 6 with non-zero, non-VH weights. But they don’t have the same weight. This is not supposed to be the case.
Examining their SMF70POF fields I see:
Logical Processor 5 has bits 0,1 set to “10” – which means Vertical Medium (VM).
Logical Processor 6 has bits 0,1 set to “01” – which means Vertical Low (VL).
But if Logical Processor 6 is a VL it should have no vertical weight at all.
Well, there is another bit in SMF70POF – Bit 2. The description for that is “Polarization indication changed during interval”. (I would’ve stuck a “the” in there but nevermind.)
This bit was set on for LP 6. So the LP became a Vertical Low at some point in the interval, having been something else (indeterminable) at some other point(s). I would surmise VL was its state at the end of the interval.
So, how does this explain it having a small but non-zero weight? It turns out SMF70POW is an accumulation of sampled polar weight values, which is why (as I explained in Part Two) you divide by the number of samples (SMF70DSA) to get the average polar weight. So, some of the interval it was a VM, accumulating. And some of the interval it was a VL, not accumulating.
Mystery solved. And Bit 2 of SMF70POF is something I’ll pay more attention to in the future. (Bits 0 and 1 already feature heavily in our analysis.)
This shifting between a VM and a VL could well be caused by the total pool weight changing – as described near the beginning of this post.
The moral of the tale is that if something looks strange in your reporting you might – if you dig deep enough – see some finer structure (than if you just ignore it or rely on someone else to sort it out).
The other, more technical point, is that if almost anything changes in PR/SM terms – it can affect how HiperDispatch behaves and that could cause RMF SMF 70–1 data to behave oddly.
The words “rely on someone else to sort it out” don’t really work for me: The code’s mine now, I am my own escalation route, and the giants whose shoulders I stand on are long since retired. And, above all, this is still fun.
A few years ago I built a presentation on zIIP Capacity Planning. It highlighted the need for better capacity planning for zIIPs and outlined precisely why zIIPs couldn’t be run as busy as general purpose processors (GCPs).
Since then a lot has changed. And I, in common with most people, have a lot more experience of how zIIPs perform in customer installations. So, earlier this year I updated the presentation and broadened the title to include Performance.
I was due to “beta” the presentation at a user group meeting in London in March. Further, I was due to present it to a group of customers in Stockholm in May. The former, understandably, was cancelled. The latter happened as a Webex.
The essential thesis of the presentation is that zIIP Capacity and Performance needs a lot more care than most customers give it, particularly for CPU-stringent consumers such as the Db2 engine (MSTR and DBM1). (Actually I’ve talked about Db2 and it’s relationship with zIIP in Is Db2 Greedy?.)
What’s really new about this presentation is a shift in emphasis towards Performance, though there is plenty on Capacity. And one key aspect is LPAR Design. For example, to aid the “Needs Help” mechanism where a General Purpose Processor (GCP) aids a zIIP, some LPARs might need to forego access to zIIP. This might be controversial – as you want as much zIIP exploitation as possible. But for some LPARs giving them access to zIIP makes little or no sense. Meanwhile other LPARs might need better access to zIIP.
The presentation is also updated in a few key areas:
More comprehensive and up to date treatment of Db2 – and if you are a Db2 customer you really should pay attention to this one. (I’m grateful to John Campbell and Adrian Burke for their help with this topic.)
zCX Container Extensions in z/OS 2.4. This can be a major consumer of zIIP. Obviously this needs to be planned for and managed.
z15 System Recovery Boost (SRB). I’m looking forward to seeing how much this speeds up IPLs – and I think I’m going to have to refurbish my IPL/Restart detection code to do it justice. I also think you will want to consider how an event affects the other LPARs sharing the zIIP pool.
As with So You Don’t Think You’re An Architect?, I’m planning on evolving the presentation over time – and the above list shows how I’ve already done it. I’m also interested in giving it to any audience that wants it. Let me know if it would be of interest and I’ll see what I can do.
Every year I try to write one new presentation. Long ago, it feels like, I started on my “new for 2020” presentation. It’s the culmination-so-far 🙂 of my “architecture thing”.
“What Architecture thing?” some of you might be asking.
It’s quite a simple idea, really: It’s the notion that SMF records can be used for far more than just Performance, even the ones (such as RMF) that we’re notionally designed for Performance. A few years ago I wrote a presentation called “How To Be A Better Performance Specialist” where I pushed the germ of this notion in two directions:
Repurposing SMF for non-Performance uses.
Thinking more widely about how to visually depict things.
The first of these is what I expanded into this “Architecture” idea. (The second actually helps quite a bit.) But I needed some clear examples to back up this “who says?” notion.
My day job – advising customers on Performance matters – yields a lot of examples. While the plural of “anecdote” isn’t “data”, the accumulation of examples might be experience. And boy do I have a lot of that now. So I set to writing.
The presentation is called “So You Don’t Think You’re An Architect?” A good friend of mine – who I finally got to meet when I did a customer engagement with him – thought the title a little negative. But it’s supposed to be a provocative statement. Even if the conclusion is “… and you might be right”. So I’ve persisted with it (and haven’t lost my friend over it). 🙂
I start at the top – machines and LPARs – and work my way down to the limits of what SMF 30 can do. I stop there, not really getting much into the middleware instrumentation for two reasons:
I’ve done it to death in “Even More Fun With DDF”.
This presentation is already quite long and intensive.
On the second point, I could go for 2 hours, easily, but I doubt any forum would let me do a double session on this topic. Maybe this is the book I have in me – as supposedly everybody does. (Funnily enough I thought that was “SG24–2557 Parallel Sysplex Batch Performance”. Oh well, maybe I have two.) 🙂
One hour has to be enough to get the point across and to show some actual (reproducible) examples. “Reproducible” is important as it is not (just) about putting on a show; I want people to be able to do this stuff and to get real value out of it.
So, I’ve been on a long journey with this Architecture thing. And some of you have been on bits of the journey with me, for which I’m grateful. I think the notion we can glean architectural insight from SMF has merit. The journey continues as recently I’ve explored:
Operational Decision Manager (ODM) – whose program name is HBRKMAIN.
I’ll continue to explore – hence my “culmination-so-far” quip. I really don’t think this idea is anything like exhausted. And – in the spirit of “I’ll keep revising it” I’ve decided to put the presentation in GitHub. (But not the raw materials – yet.) You can find it here.
You might argue that I risk losing speaking engagements if I share my presentation. I have to say this hasn’t happened to me in the past, so I doubt it makes much difference now. And this presentation has already had one outing. I expect there will be more. And anyway the point is to get the material out. Having said that, I’m open to webcasting this presentation, in lieu of being able to travel.
(I’m grateful to Dougie Lawson for correcting a few errors in the original version of this.)
I don’t often write about IMS and there’s a good reason for it: Only a small proportion of the customers I deal with use it. I regard IMS as being one of those products where the customers that have it are fanatical – in a good way. 🙂
So when I do get data from such a customer I consider it a golden opportunity to enhance my tooling. And so it has been recently. I have a customer that is a merger of three mainframe estates – and I have data from two of the three heritages. Both of these have IMS.
This mergers happened long ago but, as so often happens, the distinct heritages are evident. In particular, the way they set up the IMS systems and regions differs.
You can, to a first approximation, separate IMS-related address spaces into two categories:
IMS System Address Spaces
IMS Application Regions
In what follows I’ll talk about both, referencing what you can do with SMF 30, specifically. Why SMF 30? Because processing SMF 30 is a scalable method for classifying address spaces, as I’ve written about many times before.
IMS System Address Spaces
IMS system address spaces run with program name “DFSMVRC0” and there are several different address spaces. For example, over 30 years ago the “DL/I SAS” address space became an option – to provide virtual storage constraint relief. It;s been mandatory for a long time. Also there is a DBRC address space. All have the same program name.
The system address spaces have Usage Data Sections which say “IMS”. The Product Version gives the IMS version. In this customer’s case one part of the estate says “V15” and the other part “V14”.
The IMS Control Region is the only system address space that can attach to Db2 or MQ. So, if the program name is “DFSMVRC0” and there are Usage Data Sections for either Db2 or MQ we know this is the Control Region. But this isn’t always going to be the case – as some IMS environments connect to neither Db2 nor MQ. So here the Product Qualifier field can be helpful:
Both DBRC and Control Region address spaces have a Product Qualifier of “TM”. But you can’t necessarily tell them apart from things like I/O rates. However, you might expect a DBRC address space to have a name with something like “DBR” in. (I’m not wowed by that level of fuzziness.)
A DL/I SAS has Product Qualifier “DBCTL”.
I’m going to treat IRLM as an IMS System Address Space, when really it isn’t. This is the lock manager – and it’s the same code whether you’re running IMS or Db2. The program name is DXRRLM00 and there is little to distinguish between an IRLM for IMS or for a Db2 subsystem in SMF. (In fact which Db2 an IRLM address space is associated with isn’t in SMF either.) the best my code can do is parse job names, service class, report class etc names for “IMS” or, still worse, “I” but no “D”.
IMS Application Regions
IMS application address spaces – whether MPRs or BMPs – run with program name “DFSRRC00”. They also have Usage Data Sections that say “IMS” but don’t – in the Product Qualifier field – say anything about the subsystem it’s using. Similarly, when CICS attaches to IMS it’s Product Qualifier isn’t helpful.
To my mind the distinction between a MPR (Message Processing Region) and a BMP (Batch Message Processor) is subtle. For example I’ve seen BMPs that sit there all day, fed work by MQ. You probably would glean something from Service Classes and Report Classes. Relying on the address space name is particularly fraught.
Two Diverse IMS Estates
This latest customer has two contrasting styles of IMS environment, mainly in their testing environments:
One has lots of very small IMS environments.
the other has few, larger testing environments.
Also, as I noted above, one estate is IMS V14 and the other is V15. This does not appear to be a case of V15 in Test/Development and V14 in Production.
So I guess their testing and deployment practices differ – else this would’ve been homogenised.
I’m going to enjoy talking to the customer about how these two different configurations came to be.
IMS taxonomy can be done – but it’s much messier than Db2 and MQ. It relies a lot on naming conventions and spotting numerical dynamics in the data.
Note: For brevity, I haven’t talked about IMS Datasharing. That would require me to talk at length of XCF SMF 74–2 and Coupling Facility 74–4. Something else I haven’t discussed is “Batch DL/I” – where a batch job is it’s own IMS environment. This is rather less common and I haven’t seen one of these in ages.
I would also say, not touched on here, that SMF 42–6 would yield more clues – as it documents data sets.
And, of course serious IMS work requires its own product-specific instrumentation. Plus, as Dougie pointed out to me, the Procedure JCL.
There have been a number of WLM Service Definition formatters over the years. So why do we need another one?
Well, maybe we don’t but this one is an open source one, covered by the MIT licence. That means you can change it:
You could contribute to the project.
You could modify it for your own local needs.
While IBM has other WLM Service Definition Formatters, it was easy to get permission to open source this one.
It’s the one I started on years ago and have evolved over the many engagements where I’ve advised customers on WLM.
If it has an unusual feature it’s that I’ve stuck cross links in wherever I can – which has made it easier for me to use. For example, everywhere a Service Class name appears I have a link to its definition. So, a Classification Rule definition points to the Report Class definition.
sd2html is a single PHP script, originally run on a Linux laptop and then on a MacBook Pro. Both platforms come with web servers and PHP built in. In the Mac’s case it’s Apache.
So, to use it you need to provide yourself with a tame (perhaps localhost) web server. It needs to run PHP 7.
Place sd2html.php somewhere that it can be run by the web server.
Extracting A WLM Service Definition
In my experience, most customers are still using the ISPF WLM Application. there is a pull down menu to print the Service Definition. Choose the XML option and it will write to a FB 80 sequential file. This you need to place on the web server, as previously mentioned.
Customers send me their WLM Service Definitions in this format, downloaded with EBCDIC to ASCII translation. It’s easy to email this way.
When I receive the file it looks broken. I keep reassuring customers it isn’t because I can one-line it, throwing away the new line characters. This used to be a fiddle in my editor of choice – then Sublime Text, now BBEdit. That works well.
But I’ve eliminated the edit step: sd2html now does the edit for me, before passing the repaired text onto the XML parser. (Originally the XML parser read the file on disk directly. Now the code reads the file in, removes the new lines, and then feeds the result to the XML parser.)
So you’ve got the Service Definition accessible by your PHP web server. Now what?
From a browser invoke sd2html on your web server with something like