(Originally posted 2011-07-05.)
As you probably know Hardware Data Compression has been supported by MVS and IBM mainframes for around 20 years. In several recent batch studies I’ve conducted it’s been evident in a widespread way.
(In this post I’m not talking about DB2 compression of either flavour or VSAM compression – though some of the information here applies to these functions as well.)
It’s not as simple a deployment strategy as "turn it on for everything" though I think "turn it on for all sequential data sets large enough to stripe"
may well be the flavour. There’s a certain logic to this: If striping
can make sequential data access go faster then compressing the data will
make it go even fasterer.
There’s a lot of truth in that. But consider one thing: Compression and (decompression) takes CPU cycles. In most software licencing schemes that translates into an increased software bill. For small-scale use that might not matter much. But used wholesale it could make a significant difference.
There’s another thing: If you cause a job that’s running in a CPU-constrained environment to burn more CPU through Compression the speed up may be disappointing. I’m not saying it’s worse than useless – I’ve no evidence of that – but it is a consideration.
My suspicion that the CPU cost can be quite high comes from seeing "low CPU" cases like DFSORT COPY operations and other simple reformatters burning far more CPU than I would have expected. (Quite often this includes a significant chunk of SRB time but not always. This is consistent with the way DFSMS/MVS does (de)compression.) There is no direct metric for the CPU cost of Compression: You have to use the overall CPU consumption and draw your own conclusions.
It sounds like I’m against using Hardware Data Compression. Actually I’m not: I’ve seen evidence of very good compression ratios (for example in SMF 14 and 15 records for non-VSAM data sets). And I know that in many of the cases I’ve seen the alternative might have been to write the data to tape, with all the handling that entails. Which brings us on to something else:
Quite a few of the scenarios I’ve seen have been one-writer-one-reader cases. You tend to think "Pipes" in those cases. You were expecting that, weren’t you. 🙂 but, seriously, the advantages are fairly obvious. One of the objections to Pipes has always been that it takes more CPU than writing data to disk or tape. But if you compress it* it’s not nearly so obvious that’s the case. It would be interesting to conduct an experiment.
This is pretty much an "it depends" situation. Aren’t they all? Actually, there’s quite
possibly more speed advantage in getting the buffering and number of stripes right – as described in SG24-2557 Parallel Sysplex Batch Performance. (This book has lots of good advice on access method tuning – written by my residency teammates in 1995.)
In short, if you’re implementing Hardware Data Compression measure the impact and effectiveness and be open to the idea there may be other ways of achieving the speed up.
Finally, I said this wasn’t VSAM but note that SMF 64 also has a Compressed Data Statistics Section, just like SMF 14 and 15. My expectation is, however, that the access pattern to compressed VSAM is not sequential. (For QSAM and BSAM I expect it to be largely sequential.)
* Tape compression is a different matter – with the hard work being done in the control unit.