(Originally posted 2011-05-29.)
At one level Performance and Capacity Management and Systems Investigation are clearly linked: They share the same data. Or much of it at least.
But I think they’re linked in another way, too.
Over the past few years I’ve gradually shifted emphasis towards Systems Investigation. But this has only been a slight shift, a “non modo sed etiam” and still only really mainframe. So I’m still me but I’ve slowly realised I look at things a little differently – in addition to the old foci. And this thinking will feed into the “I Know What You Did Last Summer” presentation I’ve got started on.
But this post isn’t about that. It’s making a different point. And maybe a fairly obvious one:
When looking at Systems (and Application) performance it really pays to know what the landscape looks like. The same is true of Capacity, of course.
The received wisdom (and I certainly buy into it) is that Performance is about top down decomposition. For example, find the big CPU use categories and break them down: LPAR -> WLM Workload -> WLM Service Class -> Address Space -> Transaction -> Plan -> Package -> Statement.
We’re doing fine on this – particularly if we’re desirous of a technologically-neutral technique – until we hit the address space level. A CICS workload looks the same as a SAP one and as a Batch one: They’re just gobs of CPU. Even at this point I’m a little nervous as I like to be able to pronounce the names in the frames. (That’s why as a customer you might hear me ask “how do you pronounce this?”)
But when we get to the address space level that approach begins to unravel: For a start there might not be an address space. For instance, a WLM Service Class that manages DDF work won’t have address spaces in it: It’ll have enclaves. So how on earth can we decompose those two engines of CPU into actors – below the Service Class level? We certainly can’t do it with SMF Type 30 records.
Similarly, when I sketched out “Transaction -> Plan -> Package -> Statement” that only really works for CICS or IMS transactions accessing DB2.
To be fair you can do decomposition of things like DDF and Batch jobs with the right instrumentation and techniques. With that comment the point is hoving into view:
Sooner or later you have to know what this stuff is. The technologically neutral approach gives out after a certain point – and it’s different for different environments.
Another example is memory: With caveats on data clarity you can decompose memory usage in a similar way. But it behaves differently to CPU, and different from one user to another. While you might expect CPU to grow linearly – more or less – with workload, you wouldn’t really expect that of memory. Or at least you shouldn’t:
- Some workloads do use memory in a linear way: Double the workload (by whatever metric you choose) and the memory usage doubles. The classic example is TSO users: Go from 200 of them to 400 and the memory usage goes from 1GB to 2GB (at least in 1990 terms).
- Many workloads are sub-linear: Double the number of CICS users and they memory usage may go up by only 50%.
Indeed the latter case is an example of where it’s not clear at all: When you say “double the number of CICS users” are you expecting to double the number of regions? Or do you mean add the users into existing regions?
So the conclusion is you need to know about the applications to get very far. And you probably need to know a lot about things like LPAR setup. Indeed, as I’ve often said, just keeping track of all those LPARs is a major headache for many customers these days.
So, I’d encourage you to get curious about your systems. Take a Systems Investigative perspective when you can. It’s also a great way to build common understanding with those that actually run the systems.
But this is not the same as the school of tuning which says “find sins of omission or commission and comment on them”. These kinds of sins are important – but only in the context of a top-down approach. Who cares if a parameter is not set correctly – in a classical sense – if it affects nothing you care about?
So, as I say the linkage between Capacity / Performance and Understanding Systems is twofold: The raw data and the need to know what’s really happening on systems and with applications.