When I examine Workload Manager for a customer a key aspect is goal setting. This has a number of aspects:
- How work gets classified – to which Service Class and Report Class by what rules.
- What the goals are for each Service Class Period.
- What the period boundaries should be.
This post focuses on Aspect 3: Period Boundaries.
When a transaction executes it accumulates service. Generally this is CPU time, especially with modern Service Definition Coefficients.
For transactions that support it you can define multiple Service Class Periods. Each period – except the last – has a duration.
Some transaction types, most notably CICS, only have a single period. For them the discussion of period durations is moot.
The z/OS component that monitors service consumption is System Resources Manager (SRM). SRM predates Workload Manager (WLM) by decades. (It’s important not to see WLM as replacing SRM but rather as supervising it. WLM replaces human-written controls for SRM.) Periodically SRM checks work’s consumption of resources. If the transaction has exceeded the relevant period duration the transaction moves to the next period.
It isn’t the case that a transaction using more service than its current period’s duration directly triggers period switch – so it would be normal for a (generally) slight exceeding as there is some slight latency to detection.
The purpose of multiple periods is, of course, to give good service to light consumers of service and to progressively slow down heavier consumers.
Note: A common mistake is to think that transactions fall through into later periods because of their elapsed time. They don’t; It’s about service. Granted, a long running transaction might be long running because of the CPU it’s burning. But that’s not the same thing as saying it’s the elapsed time that drove it to later periods.
Here are two example graphs from the same customer. They are new in our code base, though Service Class Period ending proportions are something we’ve talked to customers about for many years. I’m pleased we have these as I think they will tell some interesting new stories. You’ll get hints of what I think those stories might be based on the two examples from my “guinea pig” customer.
Each graph plots transaction ending rates for each period of the Service Class across a day. In the heading is information about period boundaries and how many service units the transactions ending there consumed on average. I feel the usefulness of that latter will emerge with more experience – and I might write about it then. (And graph headings is one place my code has a high tendency to evolve, based on experiences with customers.)
Though the two examples are DDF I don’t intend to talk much about Db2 DDF Analysis Tool – except to say, used right, it would bring some clarity to the two examples.
This Service Class looks like many Production transaction service classes – with the classic “double hump” shape. I consider that an interesting – if extremely common – architectural fact. There’s something about this workload that looks reminiscent of, say, CICS transactions.
Quite a high proportion of the transactions end in Period 2 and a fair proportion in Period 3. Those in Period 3 are, on average, very heavy indeed – consuming an average of 162K service units. (This being DDF, the transaction ends when the work is committed – which might not be the end of the transaction from the client’s point of view.)
It seems to me the period boundaries are reasonable in this case, but see “Conclusion” below.
This Service Class looks quite different:
- The transaction rate is more or less constant – with two spikes, twelve hours apart. I consider both the constant transaction rate and the twelve-hourly spikes to be interesting architectural facts.
- Almost all transactions end in Period 1. In fact well within Period 1. The very few Period 3 transactions are extremely long.
Despite the name “DDF Low” I think we have something very regular and well controlled here. I say “despite” as, generally, less well understood / sponsored work tends to be thought of as “low”.
I will comment that, when it comes to goal setting, business considerations play a big part. For example, some of the effects we might see at the technical level could be precisely what it needed. Or precisely what is not needed. So I tend not to walk in with recommendations for things like transaction goals – but I might walk out with them. Contrast this with what I call my “Model Policy” – which I discussed in Analysing A WLM Policy – Part 1 in 2013. Core bits of that are as close to non-negotiable as I get.
However, it is – as I think this post shows – very useful in discussions of period durations to know the proportions of transactions for a Service Class that end in each period. If everything falls through into Period 2, for example, Period 1’s duration is probably too short. And not just the proportions but the transaction rates across, say, the day.
One other thing, which I’ll leave as a question: What happens if you slow down a transaction that, say, holds lots of locks?