(Originally posted 2011-07-02.)
Following on from The Best Sort Is The One You Don’t Do here’s another reason for eliminating sorts. I think it’s worth a post in its own right.
(In this post, again, I’m talking about resequencing passes over data – not copying or merging.)
With a sort it’s possible the last record read in might be the first record written out. So you can never overlap input and output phases. (There might even be a phase between the end of the input phase and the beginning of the output phase – particularly if there are intermediate merges.)
A common pattern in a job is a processing step, then a sort step, then another processing step, and so on. Often the first processing step writes a data set the sort reads and the second processing step reads the sorted version of that. (It’s a separate question whether either processing step could have been performed by the sort, of course.) Let’s call the first processing step "W", the sort "S" and the second processing step "R".
Now consider what happens with BatchPipes/MVS. With Pipes you overlap the reader and the writer. That’s part of the benefit (along with I/O time reductions and, perhaps, the elimination of tape mounts). In the above scenario you can overlap W with the input phase of S – with a pipe. Likewise you can overlap the output phase of S with R. What you can’t do is overlap W with R – because you can’t overlap the two phases of S.
A pity but perhaps not a huge one. Let’s illustrate this with some numbers. Suppose:
- W runs for 10 minutes, writing all the while.
- S has an input phase of 5 minutes and an output phase of 5 minutes.
- R runs for 10 minutes, reading all the while.
The three steps between them take 10 + 5 + 5 +10 = 30 minutes.
Overlapping W with S’ input phase saves 5 minutes (as S’ input phase gets stretched to 10 minutes). Similarly overlapping S’ output phase with R saves 5 minutes. So we save 10 minutes overall.* We like to say "the sort gets done for free".
If we can eliminate the sort completely we can overlap S and W completely with a total saving of 20 minutes – 10 for the overlap and 10 for the sort removal. (Without Pipes we’d still see a 10 minute reduction just by removing the sort.)
So, this is another case where removing a sort is a valuable thing to achieve. But as I said in the referenced post that’s easier said than done.
* In this calculation you’ll notice I’ve assumed the I/O time reduction is zero and that there are no tape mounts eliminated. Generally there is some I/O reduction, of course. But it’s still a fair comparison. It’s just the benefits of Pipes are understated.