(Originally posted 2014-05-17.)
You can blame the weather for this post. 🙂 I’m writing it on a flight above thick cloud[1] on my way to Munich and then to Budapest for this year’s European System z Technical University.
I like to see the complete picture when I’m examining systems: It makes getting it right so much easier. And there’s something rather satisfying about getting your arms all the way round something.
But I don’t always get “complete” data from a customer. So I work with what I can get and this post is about what I can infer about others systems whose data I don’t have.
When I talk of “not getting data from all systems” I should perhaps clarify: Most installations run RMF on most of their systems and the SMFID in the header of SMF records is the system RMF ran on. I do get information at some level about other systems from RMF SMF records, but its far from complete.
Partial Data
There are a number of good reasons why customers don’t send me data for all systems, including:
- It can be a lot of data.
- Coordinating across multiple systems can be difficult.
- One system, or maybe two, shows the behaviour of all eight.
- Only a subset of the systems are of interest.
The last of these is the most common, particularly with installations jamming all their Production systems into one Sysplex.[2]
For some situations I really do need to see all systems. A few examples that come to mind are:
- When designing a software cost minimisation scheme I want to see all the systems’ use of CPU.
- When understanding the dynamics of a coupling facility structure I want to (at very least) see the request rates from all systems using the structure.
- I recently had a Group Capacity situation where I only had SMF 70–1 data from 1 of the 2 systems in the group: I couldn’t explain why it was hitting the cap.[3]
But generally I can tolerate seeing data from a subset, so I’m not insistent when I don’t need to be.
The question of the day is “how much can I glean about systems whose data isn’t present?” Because maybe I can get a good understanding of an installation anyway. So let’s see what we can do.
Spotting Other LPARs
You can see all the LPARs on a physical machine from SMF 70 Subtype 1 Logical Partition Data Section[4]. You get further detail on logical engines, memory allocated and CPU Utilisation in the 70–1 Logical Processor Data Section for these LPARs.[4]
Among other things the names and definitions of these LPARs can be fascinating.
You also get a small amount of data for deactivated LPARs, most particularly the name and Partition Number.[5] It’s relevant to know for example that one machine has an activated SYSB and another has a deactivated one.[6]
Spotting Other Systems
I can sometimes see the existence of other systems, not on the same footprint, Here are a couple of examples of how:
- SMF 74–4 (Coupling Facility Activity) has a list of all the systems in the Parallel Sysplex[7]. But I don’t see from this data which footprint they are on, or anything else about them.
SMF 74–2 (XCF Activity) has information about XCF members (and their corresponding job name). So if this system uses XCF to communicate with members in other LPARs you see those other members and those other systems.[0]
A nice example of this is DB2 Data Sharing where – through the three XCF groups involved – you see all the IRLMs. In one case I saw four IRLMs on four systems, despite only having RMF SMF from one of them.
Another nice example is CICS regions that talk to ones on this system via XCF.
Spotting Coupling Facilities
RMF SMF 74–4 records are cut for all coupling facilities in the Parallel Sysplex, regardless of which footprint they are on.
This data nowadays includes the machine serial number and LPAR Number.
Sometimes I infer the existence of a whole machine – where none of the systems on it provided RMF data – from the existence of a coupling facility on it.
And What Of It?
Maybe not much to you if you work in a customer.[8] But to me this fills in handy gaps. And it’s nice to spot probably unintended clues.
(Completed on a bumpy ride from Munich to Budapest.) 🙂
-
Rest assured that if there were no cloud below I’d be enjoying the view instead of writing. 🙂 ↩
-
If you don’t know why then ask a grownup. 🙂 ↩
-
While from 70–1 I know when an LPAR is affected by the group cap for LPARs that I don’t have data for I don’t get each LPAR’s Rolling 4 Hour Average CPU Utilisation – if I don’t have the SMF records their RMF cut. ↩
-
Which shows up in the Partition Data postprocessor report. ↩
-
As you probably guessed, it’s likely to be a recovery LPAR in case, for example, the first machine dies. ↩
-
These are 8-character XCF System Names rather than 4-character SMFIDs but usually they are the same (or at least relatable). ↩
-
Actually I’m increasingly of the opinion this isn’t true: It’s probable as a customer you don’t know as much as you’d like to about what goes on in your installation. ↩