If you want really good Db2 performance you follow the guidelines from Db2 experts.
These guidelines contain such things as “ensure Db2 has excellent access to zIIP” and “put the Db2 address spaces up high in the WLM hierarchy”. Some of these rules come as a bit of a shock to some people, apparently.
If you take them all together it sounds like Db2 is greedy. But is it really? This blog post seeks to answer that question.
Why Is Db2 Performance So Important?
It’s a good idea to understand how important Db2’s performance is – or isn’t. Let’s assume at least some of the work connecting to Db2 is businesswise important. Then it depends on Db2’s own performance.
The principle is the server needs to have better access to resources than the work it serves.
Let’s take two examples:
- If writing to the Db2 logs slows down commits will slow down – and the work that wants to commit will slow down.
- If Prefetch slows down the work for which the prefetch is done will slow down. Ultimately – if we run out of Prefetch Engines – the work will revert to synchronous I/O.
So, in both these cases we want the appropriate Db2 components to run as fast as possible.
“Ah”, you might say, “but a lot of this work is asynchronous”. Yes, but here’s something you need to bear in mind: Asynchronous work is not necessarily completely overlapped. There’s a reason we have time buckets in Accounting Trace (SMF 101) for asynchronous activities: A sufficiently slowed down async activity can indeed become only partially overlapped. So it does matter.
What Resources Are We Talking About?
In both the above examples zIIP performance comes into play:
- Since Db2 Version 10, Prefetch Engines (which are really Dependent Enclaves) are 100% eligible for zIIP. Similarly Deferred Write Engines. These run in the DBM1 address space.
- Since Db2 Version 11, Log Writes are similarly eligible. These are issued from the MSTR address space.
Note: The number of Prefetch Engines (or rather their limit) is greatly increased in Db2 Version 12 – from 600 to 900. I think this provides some headroom for surges in requests – but it doesn’t obviate the need for DBM1 to have very good access to zIIP.
By the way Robert Catterall has a very good discussion on this in Db2 for z/OS Buffer Pools: Clearing the Air Regarding PREFETCH DISABLED – NO READ ENGINE.
Note Also: Everything I’ve said about Prefetch Engines is true of Deferred Write Engines.
The above has been about zIIP but almost all the same things apply to General Purpose CPU. The two go together in the WLM Policy: Delay for CPU and Delay For zIIP are part of the denominator in the calculation of Velocity.
By the way, just because a Delay sample count is low doesn’t necessarily mean there was no delay. Furthermore, don’t be overly reassured if the zIIP-on-GCP number is low: In both cases there can be zIIP Delay in between samples. Further, the not crossing over to GCP can still hide delay in getting dispatched on a zIIP.
Further, there are scenarios where a zIIP doesn’t get help from a GCP:
- The logical zIIP has to be dispatched on a physical to ask for help. This is less timely for a Vertical Medium (VM) and still less likely for a Vertical Low (VL) – compared to a Vertical High (VH).
- The GCP pool might itself be very busy and so might not offer to help the zIIPs.
(By the way zIIPs ask for help; GCPs don’t go round trying to help.)
Of course, CPU is not the only category of resource Db2 needs good access to. Memory is another. Db2 is a great exploiter of memory, getting better with each release. By Version 10 virtually all of it was 64-Bit and long-term page-fixed 1MB pages are the norm for buffer pools.
It remains important to provision Db2 with the memory it needs. For example, it’s best to prevent buffer pools from paging.
Usually Db2 is the best place to consider memory exploitation, too.
Then there’s I/O. Obviously fast disk response helps give fast transaction response. And technologies like zHyperWrite and zHyperLink can make a significant difference; Db2 is usually an early exploiter of I/O technology.
For brevity, I won’t go into Db2 Datasharing requirements.
Db2 And WLM
The standard advice for Db2 and WLM is threefold:
- Place the IRLM (Lock Manager) address space in SYSSTC – so above all the application workload and the rest of the Db2 subsystem.
- Place the DBM1, MSTR, and DIST address spaces above all the applications, with the possible exception of genuine CICS Terminal Owning Regions (TORs), using CPU Critical to further protect their access to CPU.
- Understand that DDF work should be separately classified from the DIST address space and should be below the DIST address space in the WLM hierarchy.
“Below” means of a lower Importance, not necessarily a lower Velocity. Importance trumps the tightness of the goal.
While we’re talking about Velocity, it’s important to keep the goal realistic. In my experience I/O Priority Queuing keeps attained Velocity higher than without it enabled. Because the samples are dominated by Using I/O. This also means CPU and zIIP play less of a role in the Velocity calculation.
Follow these guidelines and you give Db2 the best chance in competing for resources – but still no guarantee a heavily constrained system won’t cause it problems.
What Of Test Db2 Subsystems?
Perhaps a Test* Db2 isn’t important. Well it is to the applications it serves, in the same way that a Production Db2 is to a Production workload. So two things flow from this:
- The test Db2 should have better access to resources than the work it supports.
- If you want to run a Production-like test you’d better give the Db2 that’s part of it good access to resources.
That’s not to say you have to treat a Test Db2 quite as well as you’d treat a Production Db2.
What Of Other Middleware?
Db2 isn’t necessarily exceptional in this regard. MQ and IMS are certainly similar. For MQ we’re talking about the MSTR and CHIN address spaces. For IMS the Control Region and SAS, at least.
You’d want all of these to perform well – to serve the applications they do.
Despite apparently strident demands, I wouldn’t say Db2 is greedy. I entirely support the notion that it needs exceptionally good access to resources – because of how its clients’ performance is so dependent on this. But there are indeed other things that play an even more central role – such as the address spaces in SYSTEM and those that are in SYSSTC.
One final point: I said I wouldn’t talk about Datasharing but I will just say that an ill-performing member can drag down the whole group’s performance. So we have to take a sysplex-side view of resource allocation.