(Originally posted 2011-07-02.)
Have you ever had the suspicion a sort was unnecessary in your batch? I bet you have.
In recent Batch Performance studies I’ve had the suspicion that many of the sorts are unnecessary: Either they should be merges or not done at all. But how do you prove it?
But first, what do I mean by a sort not needing to be done at all?
Clearly if the data is reformatted then something has to be done to it – just maybe not a sort. It could be a copy or maybe a merge. So I’m really talking about the need to reorder the records. Reordering them is more expensive in terms of disk space (for sort work data sets), CPU and run time. So a sort is best avoided.
So how do you figure out if a sort needs doing? One way is to remove the sort and see what happens. Probably best not done in Production. 🙂 That’s a form of "destructive testing". But there is another way: Essentially running the test I’m about to outline in the time frame of the sort.
In a moment "choreography" but first the mechanics:
Testing If Data Is Already Sorted
For a DFSORT MERGE operation to be successful all the data sets being merged must already be sorted on the merge key(s). Otherwise you get a return code of 16 and a ICE068A message such as:
ICE068A 0 OUT OF SEQUENCE SORTIN01
We can use this to our advantage by attempting to merge the supposedly-already-sorted data set with a dummy file. (DD DUMMY will do just fine.) If the data set is already sorted the step will complete with a 0 return code. Otherwise, as I say, it will be RC=16.
It’s worth noting that the merge will fail immediately it detects a record out of sequence. This means a badly unsorted data set will fail fast. However a largely well sorted data set may not fail the test until it has been almost completely read: Perhaps the last record is the only one out of sequence.
How To Incorporate This Test
If the input data set is persistent you could either add the test step in just before the sort or at the end of the job. If it’s transient (perhaps temporary) it has to be tested before it goes away. In short you run the test when it’s still there but not otherwise updated.
One important aspect is the intrusiveness of the test: Breaking a BatchPipes/MVS pipe to do it is a particularly bad idea. Holding up processing for tape mounts is probably bad. And maybe running the test alongside the actual sort slows it down.
If you were tuning your batch window and suspected a sort was unnecessary you might allow the test to run a few dozen times to gain confidence that it’s not needed. But you might never be sufficiently confident.
One other thing: You might decide to always run the test and run the sort only if the test step ends with RC=16. But then a downstream step would have to access the right data set – presumably the SORTOUT from the test step or the SORTOUT from the sort, depending.
At the end of the day it’s better to know your batch well enough to avoid having to rely on tests like this. But in reality people don’t understand their batch that well, in my experience. Not a criticism, just an observation. And a situation caused by the complexity and longevity of the typical batchscape.
Maybe this technique is best used during development or in a test environment. As I said on Twitter yesterday I was thinking about ways of "injecting dye into the water" in a data flow sort of way. Maybe I’ll think of some additional "dye test" or "smoke test" (if you prefer) techniques. One other wrinkle might be to inject record sequence numbers into the data to see whether they’re preserved.