This post is about processor drawers and how the topic might influence your LPAR design.
Introduction
Once upon a time drawers and books were very simple. If you wanted a certain number of processors – whether GCP, zIIP, zAAP, IFL, or ICF – that determined the number of drawers you had. (I’m still hearing people refer to them as books, even though that went out when we went from vertical orientation to horizontal.)
Now it’s got more complex. I think this might really have taken off with z15 – but it’s certainly a feature of z16 as well. And I expect it to remain a feature – though I’m not saying anything specific about any future processor ranges.
I’m seeing quite a few Max200 machines – even though the characterised processor count is nowhere near that number. I’m also seeing Max82 machines. Again, this isn’t usually driven by processor counts. (And just because I don’t list the other models doesn’t mean they’re not out there; My sample is limited.)
So this post discusses why. And it’s drawn from the increasingly frequent discussions I have with customers about their, er, drawers. š
z16 Drawers
This post might seem quaint when future processors come out. But in 2023 it’s “up to the moment”. And, to keep it simple(r), I’m not going to talk about A02 machines.
We designate z16 A01 machines as Max39, Max82, Max125, Max168, and Max200. These relate to the maximum number of characterisable physical processors. The top 2 models are 4 drawer machines. Max39 has 1 drawer, Max82 has 2, Max125 has 3.
I should perhaps explain that the term “characterisable physical processors” refers to the GCPs, zIIPs, IFLs, and ICFs a customer has bought – on the machine. (Technically it refers to the IFP and SAPs also, but this post isn’t really about those.)
A drawer contains, among other things, processors, memory, and sockets for ICA-SR coupling facility links. Apart from the processors RMF doesn’t surface any of this. And indeed I infer drawer count from the maximum number of characterisable processors (field in SMF 70-1).
More Drawers For Greater Resilience
One of the main reasons for having more drawers is to increase resilience. Let me explain a couple of reasons why this might be so. And bear in mind the events I describe are very rare but well worth planning for.
Losing A Drawer
If you condition the machine correctly the need to replace a drawer need not stop a machine. If you have spare capacity in other drawers the purchased processors can move and the LPARs’ logicals along with it. I also understand that memory – if there is sufficient physically in the surviving drawers – can be “moved” to replace that from the offgoing drawer.
One relatively happy circumstance in which a drawer might need disabling could be for adding physical memory. (I don’t know if adding ICA-SR connections requires this.)
Obviously a single-drawer machine can’t participate in concurrent drawer removal. Further, the remaining drawers – in extremis – might get crowded.
But it’s certainly something to plan ahead for – if you can.
Recovering To Another Machine
In a two (or more) machine configuration you might hope to survive a machine-level outage by recovering to the surviving machine(s).
If there weren’t spare capacity, possibly using a generally unpopulated drawer, recovery to survivors might not be feasible.
Plenty of customers have machines with many uncharacterised cores, often in their own drawers. And such drawers would be expected to have memory.
More Drawers For Separation
PR/SM’s algorithms for LPAR and logical processor and memory placement tries to separate ICF and IFL LPARs from z/OS LPARs:
- It tries to place the GCPs and zIIPs in the bottom drawers, working upwards. z/OS memory also.
- It tries to place the IFLs and ICFs in the top drawers, working downwards. Their memory also.
It works better if the z/OS LPARs are kept separate from the others, especially not sharing Dual Chip Modules (DCMs).
With a single drawer that can’t be done.
If there are either so many z/OS logical processors or IFLs and ICFs that they can’t be separated PR/SM can’t achieve the ideal. This is not just a single-drawer problem.
More Drawers For Scalabiity
PR/SM tries to keep the logical processors and memory for an LPAR in the same drawer. If an LPAR grows too big it might not be possible to keep it in a single drawer. If that happens there will be cross-drawer memory and cross-drawer (virtual) level 4 cache accesses. These cost many more cycles than in-drawer accesses. So drawer crossing is best avoided.
Where LPARs get sufficiently large it might very well be better to split the LPAR. Whether each LPAR ends up in its own drawer will depend on their sizes. If two LPARs would between them be too large for a single drawer you’d hope they ended up in separate drawers.
Yes, there could well be more CPU cost – perhaps because of Db2 datasharing scaling out. But there’s a resilience benefit – in that more LPARs sharing a given workload tend to have better resilience characteristics.
I have observed cases where 2 LPARs in a sysplex fit in Drawer 1. PR/SM is observed – using the instrumentation I’m about to describe – to indeed place both of them in Drawer 1. In one case – with a Max82 (2 drawer) machine nothing ended up in Drawer 2. This is by design.
What Is In Each Drawer?
With z16 SMF 70-1 learnt a new trick (and I along with it). Prior to z16 you could get the home addresses of logical processors from SMF 99-14. But
- It only gave you the information for the LPAR whose RMF cut the records.
- It only told you about z/OS LPARs.
- It gave you no information about physical processor locations.
With z16 and the appropriate RMF support you now get the home addresses for all logical processors for all LPARs, no matter what the LPAR is. (Including IFLs and ICFs.) What you don’t get – which SMF 99-14 has – is affinity nodes. Perhaps you can guess that from processors that behave like each other, but it’s only a small pity anyway.
I also think the home addresses for the PHYSICAL LPAR have significance: In the data I’ve processed these look very much like the physical locations of the characterised processors. But I’ve not seen this written anywhere – so maybe it can’t be relied on. Certainly the home addresses of the LPARs’ logical processors never stray outside of PHYSICAL’s home addresses. And their number corresponds – as it always has – to what is purchased (also given in SMF 70-1 but in a different section).
Neither SMF 70-1 nor 99-14 will tell you where an LPAR’s memory is. So I’d be especially careful of LPARs’ whose memory footprint might approach the average drawer’s memory.
One point to remember is that a logical processor home address is not necessarily where it will be dispatched. For Vertical High’s (VH’s) it is. For Vertical Mediums (VM’s) rather less so as they have to share physical processors. For VL’s still less so. But, as I said, even a VL can’t be dispatched outside of the physical cores of a given type.
Conclusion
I wanted to sensitise some of you to the question of “how many drawers should a machine have?” And “why?”
I also wanted to introduce (nearly all of) you to the new instrumentation in 70-1. This really changes the processor analysis game.
I haven’t necessarily covered all the aspects of this topic, of course. For that a good place to start is this Redbook. I found page 112 onwards a good read.
I also realise that drawers aren’t cost free. I also was confronted with the fact that the number of drawers a machine can have is limited by the number of frames. Further, that the bigger machines are factory build only.
Still, I hope this has been food for thought. And I expect to have even more discussions about drawers with customers going forward.
One other point: The first drawer is said to have fewer characterisable cores than subsequent ones. It seems cautious to assume any drawer’s size to be the smallest one in the machine, not just the first’s. So, for z16 that would be 39. In any case you don’t want to get too close to a full drawer.
One final thought: You can’t predict which logical (and physical) processors will end up in which drawers. You can only design sensibly and verify with 70-1 and check the effects with SMF 113. In fact the theme running through this post is indeed “design sensibly”.
Making Of
This post was started on a plane to Istanbul – to run a customer workshop. Without giving anything away at all, I can say that drawers were a topic of conversation. One of many. And this is far from the only customer where the topic has come up recently. And the post was concluded on the way back.
If you’ll pardon the pun, this post draws on my experiences using code to analyse the new SMF 70-1 fields. What I haven’t yet done is updated my diagramming code to use this data. Perhaps it’s time I did. Actually, I returned to this post a couple of weeks later – before publishing. I have some thoughts on diagramming – which could only be done with z16 SMF 70-1. As I prototype and then refine I probably will write another post.
One working title for this post Was “Er, In Drawers”. I suspect the pun is a Britishism and probably should be retired.