New DFSORT Functions

(Originally posted 2008-07-29.)

Yesterday DFSORT announced a new set of functions – as PTF UK90013. The documentation for it can be found here.

Every year or so there’s a new set of DFSORT functions – and generally they’re “out of cycle” with z/OS releases – although they are incorporated into subsequent releases of z/OS. This means that fewer of you will know about the functions, particularly as we don’t make a big fuss about it at z/OS release announcement time. So you quite possibly, when you move to a new release of z/OS, get new DFSORT functions you don’t know about.

I’m privileged to call the DFSORT developers friends. And I get to “beta” the new code ahead of release. This time, due to other commitments (like the “Parallel Sysplex Performance Topics” Redbook), it’s been difficult to find the time to play much with the code.

So, what’s new?

Here are a few highlights:

  • FINDREP makes it MUCH easier to do “find and replace” operations. Hence, presumably, the name. 🙂

    There are a number of ways of specifying the search string and its replacement. For example, you can specify multiple input strings that get changed to the same output string. You can also define pairs of strings so that you can find and replace multiple strings in one pass over the data. But you can also just specify a single string and its replacement.

    These strings can be specified as character strings (e.g C’XYZ’) or hexadecimal strings (e.g. X’FFAB’). Or as multiples (e.g 4X’FF’). And DFSORT Symbols can be used for C’XYZ’ and X’FFAB’ styles (but not the “multiplier” styles).

    You can specify search “margins”. So, you could specify that strings are to be sought between positions 11 and 71, for example.

    You can specify the maximum number of times find and replace is performed for a record. So you could specify for example only the first match is to be replaced.

    You can say what is to happen if a replacement operation causes the output record to become wider than the LRECL.

    If you don’t want the remainder of the record to be shifted left or right after a match you can specify that as well.

    All in all a nicely thought out set of options.

    Here’s an example I actually need today:

    OPTION COPY                             
    INREC FINDREP=(IN=(C'#@$'),OUT=(C'SYS'))
    

    All our systems have SMFIDs beginning “#@$” and our tooling currently has a problem with a “$” character in certain places. Replacing “#@$” with “SYS” gets us out of a hole. And it’s pretty much guaranteed that anywhere in the SMF records we see “#@$” is part of a SMFID. So preprocessing with FINDREP will help.

  • Group operations (using WHEN=GROUP) allow you to identify and operate on groups of records.

    A group of records can be identified in one of two ways:

    • Every n records is a new group. (This uses the “RECORDS=” syntax variant.)
    • All the records between a header record and a trailer record is a new group. (This uses the “BEGIN=” and/or “END=” syntax variants.)

    Here’s an example, straight from the new documentation:

    INREC IFTHEN=(WHEN=GROUP,RECORDS=3,PUSH=(15:ID=3,19:SEQ=5)
    

    specifies groups of three consecutive records. Position 15 for 3 is an identifier that increments for each group. Position 19 for 5 is a sequence number that increments by 1 and restarts at the beginning of each group.

    In the above it’s the PUSH that actually edits the records.

    One thing to note: It’s entirely possible (but not in this example) that records fail to fall into any group. With BEGIN and END it’s possible to have records before the first BEGIN hit and after the last END hit and between an END hit and the next BEGIN hit. Here’s a way of detecting them:

    OPTION COPY
    INREC IFTHEN=(WHEN=GROUP,BEGIN=(1,5,CH,EQ,C'START'),END=(1,4,CH,EQ,C'STOP'),PUSH=(10:ID=1))
    OUTFIL FNAMES=REJECTED,INCLUDE=(10,1,CH,EQ,C' ')
    

    In the above case groups start with a record with “START” in them and end with records with “END” in them. The “PUSH” sets a flag for records in a group. The OUTFIL writes records to a sidefile where the flag wasn’t set.

    Just for grins, I tried a “mis-nesting” or “mis-bracketing” where the input stream was:

    START
    START
    STOP
    STOP
    

    The output was:

    START    
    1START    
    2STOP     
    2STOP
    

    So each “START” record starts a new group, regardless of whether the previous group was terminated with an “STOP” record. So the second “STOP” record isn’t part of any group. Still, I expect most applications will have “well formed” input data, cough cough. 🙂

    RECORDS, if specified with BEGIN or END has a slightly different role to when it appears on its own. It limits the number of records in a group.

One of the nice things about the above is you can fit them into an IFTHEN “pipeline”. IFTHEN lets you pass records through a sequence of filters – much like real pipelines. WHEN=GROUP, in particular is always used within IFTHEN, even if there are no other filters (or “stages” as they’re sometimes known). Also, WHEN=GROUP can be intermixed with WHEN=INIT but must before the other WHEN clause types.

The other significant enhancements are all related to ICETOOL:

  • DATASORT is a new operator that allows you to sort the data records in a data set without sorting the header or trailer records. Header and trailer records are copied in their original order, continuing to bracket the sorted data records.

    You use HEADER, FIRST, HEADER(n), FIRST(n) to denote the first one or n records are the header. Similarly, TRAILER, LAST, TRAILER(m), LAST(m) denote the last one or m records are the trailer.

    You can use OUTFIL to post-process the entire output stream, including header and trailer records.

  • SUBSET is a new operator that selects records based on their record number, for example the first 5 records – FIRST(5). You can specify whether subsetting is done on the way into a sort or on the way out. Again you can use OUTFIL to post-process the records.

    NOTE: If you specify eg LAST(n) ICETOOL may have to call DFSORT twice. The first pass is to count the input records. The second is to actually write out to the output data set. Because the first pass doesn’t actually OPEN the output data sets this kind of two-pass approach is fine with a BatchPipes/MVS pipe as the output data set.

  • The SELECT operator is enhanced to allow you to select the first n records with each key or the first n duplicate records with each key. This is useful for a “top list” approach. (I once did something similar with REXX driving DFSORT.)

    While there are FIRST(n) and FIRSTDUP(n) forms there aren’t LAST(n) and LASTDUP(n) forms. But Example 1 in the documentation shows how you get round that using a different sort sequence.

  • The SPLICE operator is enhanced with a new keyword: WITHANY. (As if having WITHEACH and WITHALL wasn’t confusing enough already.) 🙂

    WITHANY creates one output record for each set of duplicates. The first duplicate is written out with with the non-blank values of each subsequent duplicate spliced onto it for specified fields. (“Spliced onto” or “Spliced into”?) 🙂

    The documentation gives a far better description of this than I can here. Which is another way of saying “SPLICE confuses the heck out of me.” 🙂

  • The DISPLAY operator now allows you to display counts in reports. Formerly you could display totals, maxima, minima and averages (and the “sub” variants of those).
  • The DISPLAY and OCCUR operators have greatly enhanced title capabilities:

    • You can have up to 3 title lines.
    • Each can have up to 3 strings.

    This flexibility enables you to use DFSORT Symbols in titles, including System Symbols. To do the latter code something like:

    //SYMNAMES DD *System,S'&SYSNAME'Sysplex,S'&SYSPLEX'

    and then in ICETOOL control statements something like:

    TITLE('System: ',System)
  • The COUNT operator has been enhanced to allow you to write its output to a data set. Previously it went to SYSOUT. So you could code something in the form:

    COUNT FROM(EMPIN) WRITE(EMPCT) - TEXT('Number of employees is ') - EDCOUNT(A1,U08) WIDTH(80)

    to get output like:

    Number of employees is 1,234,567

    A1 is a mask that puts commas in. U08 means “Use 8 digits”.

    COUNT also allows you to add to or subtract from the count with ADD(n) or SUB(m). If criteria like EMPTY are used it’s the modified record count that is used in the comparison (in this case to 0).

So there’s lots there to play with. And it’s available right now.

And if you want all the user guides for “recent” function PTFs go here.

Published by Martin Packer

I'm a mainframe performance guy and have been for the past 35 years. But I play with lots of other technologies as well.

One thought on “New DFSORT Functions

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: