(Originally posted 2013-09-07.)
This post, unlike the previous one, is “on topic” for a Mainframe Performance Topics blog. I hope nobody’s relying on that. In fact I joke about renaming blog this to “Mainframe, Performance, Topics” 🙂 the next time I ask for the header to be updated. In fact I just might.
I recently got some data from a customer that I thought showed a bug in my code. Instead it illustrated an important point about averages.
We all know averages have problems – taken on their own – so this post isn’t really about that. It’d be a duller post if it were.
It’s about how free memory (or, if you prefer, memory utilisation) varies
- By time of day
- By workload mix
- From the average to the minimum
You’ll notice that last point didn’t mention the maximum. This is consistent with being more interested in the free memory than the utilisation, in a number of contexts. Let me explain:
As technology has moved on it’s become more feasible to configure systems (really LPARs) so that there is some free, or at any rate so that paging is (practically) zero. I’m concerned with how well installations meet that aspiration.
So the maximum free doesn’t interest me. But the minimum does. (And so does the average.)
Consider the following graph, from the customer I mentioned.
Before this week I plotted the blue line only (and that for each day in the data). This is the average of RMF’s average memory free number – by hour. But I had code that printed in a table the minimum free across the whole set of data (from a different RMF field in the same SMF 71 number).
While the blue line suggests just under 7GB reliably free, the minimum of the minima showed about 100MB free. This is where I thought my code was buggy – as these two numbers appear to contradict each other. I’d forgotten the 100MB number came from RMF’s minimum field and wasn’t just the lowest of the averages.
The red line is, as the legend says, the minimum free number from RMF plotted across the day. And the mystery (or enigma, to half explain this post’s title) is resolved: The 100MB number is the low point of the red line.
I’ll be throwing this graph into production shortly, maybe with a tweak or two (depending on how well it “performs”).
But the interesting thing – and the point of this post – is how free memory varies. (You’ll’ve guessed that the “Variations” in the title comes from this.)
If you look at the graph you’ll notice that, mainly, the big variability is overnight. Though there is a notable divergence between the two lines at about 11AM, they’re much closer together during the day.
If you’d asked me what I expected to see I’d say this is about right (but I wouldn’t’ve been certain, not having looked at the data this way before).
Overnight, fairly obviously, the customer runs Batch. And I can prove they do from SMF 30, of course. (I actually spent a happy year working on their Batch a while back.)
I would expect a Batch workload to show a higher degree of variation in memory usage, compared to Online (or whatever it’s called nowadays). For at least two reasons:
- Long-running work managers, such as CICS, MQ, Websphere Application Server and DB2, acquire most of their memory when they start up and variation in usage is slight thereafter. For example, buffer pools are likely to be acquired at start-up and populated shortly thereafter. (I’ve seen this in almost every set of data I’ve looked at over the past mumbleteen 🙂 years.)
- Batch comes and goes. (Job steps start, acquire storage, and release it when they end. Furthermore they’re generally highly variable in their memory footprint, one to another.)
Another thing I’m not that surprised about is Batch driving free memory to zero. Assuming large sorts are happening this is often to be expected: Products like DFSORT manage their usage of memory according to system memory conditions. Often they use what’s free if there is sort work data to fill the memory. This can be managed by the installation, if it is so desired. And there often won’t be enough sort work data to fill memory. (It’s not a direct objective to fill memory, but if there’s good use for the memory why not exploit it?)
Interestingly, on another system, on another day, the minimum free memory went to near zero (and the average followed it down, albeit in not quite such an extreme way). This time the cause was a surge in usage by DUMPSRV (from SMF 30). Clearly this was a time when a large dump was captured (or maybe several moderate-sized ones).
I often talk about configuring systems – particularly their memory and paging subsystems – to make Dump Capture as non-disruptive as possible. (Several blog posts over the years have talked about this.) This will be a reminder to talk to the customer about dump containment.
Now, the above may be obvious to you (and the value of comparing minima to averages certainly will be) but I hope there are some things to think about in it – notably how Batch behaves, catering for Dump Capture, and the fact RMF gives you the minimum (and maximum) and average memory free numbers.
But I also appreciate most of you don’t look at nearly as many installations as I do: If what you see in yours matches the above that’s fine. If it doesn’t it’s worth figuring out why not (but not necessarily thinking it’s a problem).
Translation: That’s a really nice graph you might like to make for yourself. 🙂