It’s been quite a while since I last wrote about Coupling Facility performance. Indeed it’s a long time since I presented on it – so I might have to update my Parallel Sysplex Performance presentation soon.
(For reference, that last post on CF Performance was Maskerade in early 2018.)
In the past I’ve talked about how a single system’s service time to a single structure behaves with increasing load. This graphing has been pretty useful. Here’s an example.
This is from a system we’ll call SYS1. It is ICA-SR connected. This means a real cable, over less than 150m distance. It’s to a single structure in Coupling Facility CF – DFHXQLS_POOLM02, which is a list structure. Actually a CICS Temporary Storage sharing pool – “POOLM02”.
From this graph we can see that the service time for a request stays pretty constant at around 7.5μs. Also that the Coupling Facility CPU time per request is almost all of it.
I have another stock graph, actually a pair of them, which show a shift average view of all the systems’ performance with a single structure. This is pretty nice, too.
Here’s the Rate Graph across the entire sysplex.
Here we see SYS1 and it’s counterparts in the Sysplex – SYS2, SYS3, and SYS4.
(Note to self: They really are numbered that way.)
We can see that in general the traffic is mostly from SYS1 and SYS2, and almost none from SYS3. I would call that architecturally significant.
We can also see that there is no asynchronous traffic to this structure from any LPAR.
And here’s the Service Time graph.
You can see that the two IC-Peer-connected LPARs have better service times than the two ICA-SR-connected LPARs. This is reasonable given that IC Peer links are simulated by PR/SM and so unaffected by the speed of light or distance. Again, the statement has to be qualified by in general.
But the graphs you’ve seen so far leave a lot of questions unanswered.
So, for a long time I’ve wanted to do something that combined the two approaches: Performance With Increasing Load, and Differences Between Systems.
I wanted to get beyond the single-system view of scalability. I usually put a number of systems’ scalability graphs on a single slide but
- The graphs end up smaller than I would like.
- This doesn’t scale beyond four systems.
The static multi-system graphing is fine but it really doesn’t tell the full story.
Well, now I have it in my kitbag. I’m sharing a new approach with you – because I think you’ll find it interesting and useful.
How about plotting all the systems’ service times versus rates on one graph? It sounds obvious – now I mention it.
Well, let’s see how it works out. Here’s a nice example:
Again we have the same four systems and the same CF structure. Here’s what I conclude when I look at this:
- SYS2 and SYS4 have consistently better service times – across the entire operating range – than SYS1 and SYS3. This shows the same IC Peer vs ICA-SR dynamic as we saw before.
- SYS3 service times are worse than those of the other 3 – and again we see its rate top out considerably lower than those of the other 3.
- SYS2 service times are always worse than SYS4’s. They happen to share the same machine and SYS2 is a much bigger LPAR than SYS4, actually spanning more than 1 drawer. That might have something to do with it.
Coupling Facility service times and traffic remain key aspects of tuning Parallel Sysplex implementations. The approach of “understand what happens with load” also remains valid.
The new piece – combining the service times for all LPARs sharing a structure none graph – looks like the best way of summarising such behaviours so far.
Of course this graph will evolve. I can already think of two things to do to it:
- Add the link types into the series legend.
- Avoid showing systems that don’t have any traffic to the structure (and maybe indicating that in the title).
But, for now, I want to get more experience with using this graph. For example, an even more recent customer has all systems connected to each coupling facility by ICA-SR links. The graphs for that one show similar curves for each system – which is unsurprising. But maybe in that case I would see a difference if the links were of different lengths.
And, as always, if I learn something interesting I’ll let you know.