(Originally posted 2007-01-31.)
I suggested (in z/OS Release 8 Real Storage Manager SMF Record Changes) that I might talk about the changes to RSM in z/OS Release 8. As the release has been out now for about 4 months some of you might actually be on the brink of putting it into production. 🙂
Seriously, you probably are planning for the day when someone cuts over to Release 8.
RSM was largely rewritten – for a very good reason: The average LPAR has an increasing amount of memory. I typically see Production z/OS LPARs in the region of 10 to 20 GB, and I know they’re going to grow. (The largest machine I’ve seen, by the way, has 96GB on it.) As memory grows – usually faster than CPU – the cost of managing it increases:
- The CPU time increases.
- The time spent holding RSM and SRM locks increases.
So Release 8 sets about reducing both of those costs.
There’s another factor here: It is relatively rare to see an LPAR doing significant paging. Indeed most LPARs in my recent experience have gigabytes of unused memory. So, there’s less point micromanaging memory. Why agonise over which are the oldest pages in a system when you’re not going to have to throw any of them away?
The new algorithm looks very much like the old Expanded Storage algorithm… RSM keeps a cursor into the PFT. When the Available Frame Queue (AFQ) runs low, RSM needs to replenish it – to satisfy requests for memory (perhaps to back some new virtual storage page). Replenishment is undertaken by scanning the PFT, moving the cursor. Pages with the reference bit not set are deemed old and can be stolen (usually). Pages with the reference bit set have it reset, aging the page. So there is no UIC updating for pages. We sweep through memory looking for old pages (as opposed to new pages). Fixed frames (such as perhaps DB2 V8 buffer pool pages) do not participate in this.
This algorithm is much cheaper and allows LPARs to scale much better, memorywise.
RSM keeps track of how long it takes to scan the entire table. The longer it takes the less constrained memory is. And this is the new System UIC. Which is used to drive algorithms and eventually is surfaced as the Average System UIC (in RMF and SMF Type 71 records).
A question arises: Given that sometimes z/OS needs to know more than just the System UIC, how is this done? The answer is that memory is divided into (currently) 16 “Segments”. The timestamp at entry to the segment for the current sweep is compared to the timestamp for the previous sweep. This gives useful profiling information – and 16 data points.
I specifically asked the developers last summer “who stands to benefit the least from the rewrite?” One has to ask these questions – given that few algorithm changes are “all upside”. The answer was “memory-constrained systems”. So, if you have memory-constrained LPARs you might want to examine their storage allocations. If they’re Sysprog LPARs you mightn’t worry about it, of course. Depends on what you think of your sysprogs. 🙂