(Originally posted 2012-10-04.)
I don't know how many of you will've spotted this but there was a nice instrumentation enhancement in the recent System zEC12 announcement.
It comes with the RMF support for CFLEVEL 18 (OA37826 and provides much more detail on paths to Coupling Facilities (CFs).
Previously RMF reported channel path acronyms – one per path. And that was all. (A channel path acronym is something like "CIB" for "Infiniband".) Those of you who know something about Coupling Facility link technology will recognise there are multiple types of Infiniband link now. If you do you'll equally recognise other deficiencies in the RMF topology information. This support fills in many of the gaps.
(Note: The existing support covers not only paths between z/OS LPARs and CFs but also between CFs – in support of structure duplexing.)
Though my personal interest is in the enhancements to the SMF Type 74 Subtype 4 record cut by RMF, there are corresponding enhancements to RMF Monitor III and the RMF Postprocessor.
So, what do we have?
- PCHID – which allows a better view of sharing where two or more z/OS images share a link to a CF. (This might explain why IC links don't play.)
- Channel Path Operation Mode – e.g. "CIB path operating at 1X bandwidth using the IFB protocol, adapter type HCA3-O LR". I hope you'll agree this is much better characterisation than just "CIB".
- Host channel adapter ID and port number.
The above is all topology information. What we don't have is traffic over the paths. Personally I think I'd like to see it for two reasons:
- Though I don't want to reverse-engineer path selection logic I do get questions about mixed topology z/OS-to-CF configurations and I'd like the data to provide answers on how traffic is routed.
- I'd like to be able to monitor path degradation – to proactively resolve issues. I'll admit the one "bad path" situation I've been involved with was catastrophic – in that the fibre broke – but seeing degradation over time would be useful.
If you agree that traffic and error analysis would be something you'd value let me know. Alternatively, if you think it's a dumb idea let me know (gently). 🙂
The new information is available for both z/OS-to-CF and CF-to-CF links. The only difference between the two is the "anchor" in the Local Coupling Facility and Remote Coupling Facility sections.
There is some performance information, which I think is useful:
- Path Is In Degraded Mode flag. This is binary and is passed from the hardware. I'm still getting clarity on what it actually means. What is clear is that there are both "path is degraded but some signals are getting through" and "path is totally dead" situations that could lead to this flag being set.
Channel Path Latency Time. RMF uses this to estimate signalling distance at 10 microseconds per kilometer. Call me nosey but I really want this – as I like to figure out whether machines are close together or in different data centres.
The field description notes this is the average round-trip path time in microseconds. A value of 0 means that the time was not measured. A value of 1 means a time less than or equal to one microsecond. So it's obviously not accurate enough to calculate distance to the nearest metre. I suspect we'll get to one of three states: The machines are very near to each other, they are a few hundred metres apart, or they are some kilometers away (and how far away they are).
Most of the information in the SMF record is in a new section (Channel Path Data Section) – which is going to cause mass bustage of my code. (Actually I don't handle Remote Coupling Facility sections terribly well so some reworking of my code is overdue anyway.) Fortunately the lab sent me a pretty complex set of data – so if I'm quiet 🙂 you'll know I'm 'heads down' in my code.
And I'm looking forward to seeing data from real customers: If you're at z/OS RMF Release 12 or 13 you can install the PTFs and take advantage of the new stuff too.