(Originally posted 2013-07-30.)
Recently I wrote up some initial results of using OA37826 data in Coupling Facility Topology Information – A Continuing Journey .
That post in turn followed on from System zEC12 CFLEVEL 18 RMF Instrumentation Improvements .
Since then an interesting thing happened – and sooner than I thought it would: I got some data with a broken piece of Coupling Facility (CF) link infrastructure. Lest you think I’m insensitive about bad things happening to customer installations I’m going to say very little about the actual incident.
A colleague sent me a few hours of data from a customer I’d visited before. This customer has a mixture of generations of processors, including zEC12. I noticed the “Path Is Degraded” flag was set for a pair of CHPIDs between the zEC12 and one of the CFs, but that others between the pair were OK. (I also checked the flag that accompanies it to denote the flag is itself valid).
The first version of the code that detected this only listed the CHPIDs with the flag set. So I passed the CHPID list to the account team.
But I wasn’t satisfied with that: It occurred to me that actual configuration data would be better.
So I extracted the PCHID, Adapter ID and Port ID from the same section of the record (Path Data Section). Although the PCHIDs were, unsurprisingly, different the Adapter ID and Port ID were common to the two CHPIDs that were said to be degraded.
Not being that heavily into how you plug in CF links I’m not sure what that all means – but I’m going to have to learn.
In any case I’ve enhanced my code to give this additional information for degraded links. I’m also thinking I should add this information for non-degraded ones – as it might expose points of vulnerability. But I doubt I’ll get to do it until I get another zEC12 set of data in.
Well that’s the way it was when I first drafted this post. In the 36 hours since I’ve had a fit of “that really won’t do at all” 🙂 and a 3 hour train journey with wifi to press on with coding.
And I’m glad I did:
I can now see that the other two paths between this z/OS system and this coupling facility are on a different adapter. So it seems Installation Planning has been done well and the adapter isn’t a single point of failure. That’s something I’d want to check for in future sets of customer data.
Because I have only a couple of hours’ data I couldn’t detect the onset of the problem: If you’re a customer running daily reports you might want to create one to check for path degradation. I also didn’t get to see how performance was affected by the degradation: More data would’ve helped with that, too.
So the point of this post is to reinforce the view that you should – if it’s available to you – consider tracking the “Path Is Degraded” flag provided by OA37826 which externalises new support in CFLEVEL 18 (on zEC12 and the new zBC12 machines). Otherwise you’re relying on diagnostics and indicators that most of you wouldn’t ordinarily go near. Once again it’s very nice to have it in SMF.