ASCII to EBCDIC Conversion with DFSORT

(Originally posted 2006-05-13.)

(Scroll down if you’re not interested in DB2)

This past week I’ve been updating our SQL Analysis code to cope with Unicode SQL statements. (If you bind a package or plan once you’ve gone to DB2 Version 8 New Function Mode the statement text is stored in the Catalog in Unicode.) These statements are in Unicode UTF-8 which is extremely similar to ASCII. So I snaffled some existing code of mine that converts from ASCII to EBCDIC and pressed it into service…

If the first 10 bytes of the statement text (minus the 2 4-byte header fields) contain X’40’ (an EBCDIC null (space)) then I consider the text to be in EBCDIC. Otherwise I deem it to be ASCII – and do the conversion. (You might ask “what about statements like ROLLBACK?” Well, it turns out that in SYSPACKSTMT there is always a trailing blank on the statement text – so that’s alright. 🙂

(In fact my code decodes all the structural information in the SYSPACKSTMT STMT column – including extracting information on host variables.)

Now, my colleagues sometimes just want to take a look at the statement text sometimes – without bothering to run my code. So here’s what I learned when creating a DFSORT job I set up for them to run, converting the statement text from ASCII to EBCDIC.

Converting ASCII to EBCDIC With DFSORT

First, a disclaimer: Certain codepoints might need slight adjustment in some countries – for example if you have dollar versus pound sterling issues.

First you code an ALTSEQ statement:

  ALTSEQ CODE=(0000,0101,0202,0303,0437,052D,062E,072F,
    0816,0905,0A25,0B0B,0C0C,0D0D,0E0E,0F0F,
    1010,1111,1212,1313,143C,153D,1632,1726,
    1818,1919,1A3F,1B27,1C22,1D1D,1E35,1F1F,
    2040,215A,227F,237B,245B,256C,2650,277D,
    284D,295D,2A5C,2B4E,2C6B,2D60,2E4B,2F61,
    30F0,31F1,32F2,33F3,34F4,35F5,36F6,37F7,
    38F8,39F9,3A7A,3B5E,3C4C,3D7E,3E6E,3F6F,
    407C,41C1,42C2,43C3,44C4,45C5,46C6,47C7,
    48C8,49C9,4AD1,4BD2,4CD3,4DD4,4ED5,4FD6,
    50D7,51D8,52D9,53E2,54E3,55E4,56E5,57E6,
    58E7,59E8,5AE9,5BAD,5CE0,5DBD,5E5F,5F6D,
    6079,6181,6282,6383,6484,6585,6686,6787,
    6888,6989,6A91,6B92,6C93,6D94,6E95,6F96,
    7097,7198,7299,73A2,74A3,75A4,76A5,77A6,
    78A7,79A8,7AA9,7BC0,7C4F,7DD0,7EA1,7F07,
    8043,8120,8221,831C,8423,85EB,8624,879B,
    8871,8928,8A38,8B49,8C90,8DBA,8EEC,8FDF,
    9045,9129,922A,939D,9472,952B,968A,979A,
    9867,9956,9A64,9B4A,9C53,9D68,9E59,9F46,
    A0EA,A1DA,A22C,A3DE,A48B,A555,A641,A7FE,
    A858,A951,AA52,AB48,AC69,ADDB,AE8E,AF8D,
    B073,B174,B275,B3FA,B415,B5B0,B6B1,B7B3,
    B8B4,B9B5,BA6A,BBB7,BCB8,BDB9,BECC,BFBC,
    C0AB,C13E,C23B,C30A,C4BF,C58F,C63A,C714,
    C8A0,C917,CACB,CBCA,CC1A,CD1B,CE9C,CF04,
    D034,D1EF,D21E,D306,D408,D509,D677,D770,
    D8BE,D9BB,DAAC,DB54,DC63,DD65,DE66,DF62,
    E030,E142,E247,E357,E4EE,E533,E6B6,E7E1,
    E8CD,E9ED,EA36,EB44,ECCE,EDCF,EE31,EFAA,
    F0FC,F19E,F2AE,F38C,F4DD,F5DC,F639,F7FB,
    F880,F9AF,FAFD,FB78,FC76,FDB2,FE9F,FFFF)

Then you refer to this translation table in your control statements:

  INREC FIELDS=(STMT,TRAN=ALTSEQ)

In this case the field STMT (which happens to be a DFSORT Symbol name) is translated using the ALTSEQ symbol table. (There is one translation table but it can be used by selected fields – those for which TRAN=ALTSEQ is specified.)

TRAN=ALTSEQ has been supported by DFSORT for a number of years. As documented in the Reformatting: Translation section of Beyond Sorting also mentions TRAN=UTOL and TRAN=LTOU, which allow translation between lower and upper case.

In our code I have placed the ALTSEQ translation table in a data set pointed to by a (separate) DFSPARM DD. That makes it easier to manage. Indeed, if you do have code page issues you can address them using several such data sets, each with their own customisation for “local difficulties”.:-)

One final note: In Production I’ll probably use IFTHEN to automate the “blank in the first 10 bytes” test I mentioned above – as you can, with IFTHEN (UQ95213 / UQ95214), reformat records in different ways dependent on the input data. That includes choosing whether to use TRAN=ALTSEQ on individual fields.

Unknown Unknowns

(Originally posted 2006-05-06.)

Nope, I’m not going to do a Rumsfeld. 🙂

But there are a couple of things I didn’t know (I didn’t know) about DFSORT. Relating them to you is probably OK because

  • It’s not something I’m embarrassed about.:-)
  • If you want to understand some of the funkier new DFSORT tricks you probably need to know these things.

Consider these DFSORT examples (which I take no credit for). They are used to deal with groups of records and rely on features introduced about a year ago (with New Function APAR UQ95214 /UQ95213).

Here are the two things you need to know if you are ever to understand these examples…

OVERLAY items are applied in sequence

By this I mean that the first field specified is overlain first, then the second, then the third, etc… Now, the significance of this (and the reason for the bold emphasis) is that the result of the first overlaying can be used in the second overlaying. Consider the following fragment from the first example (here):

  IFTHEN=(WHEN=NONE,
    OVERLAY=(152:SEQNUM,8,ZD,
       144:144,8,ZD,SUB,152,8,ZD,M11,LENGTH=8))
 

Here’s how we process this instruction (ignoring the WHEN=NONE IFTHEN piece):

  1. We overlay positions 152-159 with a DFSORT-generated record sequence number (and more on that later in this entry).
  2. We overlay positions 144-151 with the current value in 144-151 minus the just-generated value in 152-159 (not whatever was in 152-159 before we started OVERLAYing).

And this turns out to be critical to the way this smart trick works.

Every IFTHEN stage that uses SEQNUM has it’s own separately-incremented SEQNUM counter

Consider the whole IFTHEN pipeline in the smart trick:

   INREC IFTHEN=(WHEN=INIT,OVERLAY=(144:SEQNUM,8,ZD)),                   1
         IFTHEN=(WHEN=(2,4,CH,EQ,C'RPT.'),
                 OVERLAY=(134:2,10,144:SEQNUM,8,ZD)),                    2
         IFTHEN=(WHEN=NONE,
                 OVERLAY=(152:SEQNUM,8,ZD,                               3
                   144:144,8,ZD,SUB,152,8,ZD,M11,LENGTH=8))

Each of these SEQNUMs, as I say, is a separately-incremented counter: If the WHEN condition is satisfied the OVERLAY (or BUILD) processing is performed on the record. And each SEQNUM counter is incremented…

  1. This one is applied to every record so is the true count of the records that INREC processed.
  2. This one is incremented only if (2,4,CH,EQ,C’RPT.’) evaluates to true.
  3. This one is incremented if no other IFTHEN condition (other than WHEN=INIT conditions) is satisfied, that is if (2,4,CH,EQ,C’RPT.’) evaluates to false.

Here are some other things you might not know about SEQNUM…

  • For a long time now you’ve been able to specify a starting value and an increment. e.g.
      OUTREC BUILD=( ... ,SEQNUM,8,ZD,START=5,INCR=5, ... )
    
    creates a sequence number that starts at 5 and goes up by 5 each time it’s incremented (i.e for each record it’s applied to.
  • With UQ92514 / UQ92513 you can specify a RESTART subparameter. e.g.
      INREC OVERLAY=(1:SEQNUM,8,ZD,RESTART=(16,8))
    
    restarts the sequence number at 1 each time the value in positions 16-23 changes.
  • You can sort on a sequence number – to preserve the original sequence of records, perhaps. e.g.
      INREC FIELDS=(SEQNUM,8,PD, ... )SORT  FIELDS=(1,8,PD,A)
    
  • You can perform arithmetic on a SEQNUM field – assuming you have the arithmetic expression support built into APAR UQ90053. Now, I will take credit for suggesting that MOD be supported so that sequence numbers could be written 1, 2, 3, 1, 2, 3, … (That might allow you to subsequently separate neigbouring records by sorting on this MODed sequence number, for example.) e.g.
      INREC FIELDS=(SEQNUM,8,PD,MOD,+3, ... )SORT  FIELDS=(1,8,PD,A)
    
  • Not only does each stage in an IFTHEN pipeline have its own SEQNUM counter but INREC, OUTREC and each OUTFIL statement all have their counter (or, for IFTHEN, set of counters).

So, lots more flexibility with sequence numbers than you might imagine.

Parsing XML with DFSORT

(Originally posted 2006-04-25.)

Following on from Generating XML Using DFSORT – Part II here are some thoughts on how to parse XML with DFSORT.

NOTE: For more complex XML than this entry describes you probably want to use the XML Toolkit for z/OS. This provides C++ and Java parsers for XML and a stand-alone XSLT processor.

In this example I'll show you how to take XML that looks like this and create a flat file from it:

<?xml version="1.0" encoding="UTF-8" ?>
<band>
<member surname="Mercury" firstname="Freddie" job="Singer" />
<member surname="May" firstname="Brian" job="Guitarist" />
<member surname="Taylor" firstname="Roger" job="Drummer" />
<member surname="Deacon" firstname="John" job="Bassist" />
</band>

and turn it into our (now familiar)

Mercury         Freddie         Singer   
May             Brian           Guitarist
Taylor          Roger           Drummer  
Deacon          John            Bassist  

which could be mapped with DFSORT Symbols:

Surname,*,16,CH  
Firstname,*,16,CH
Job,*,10,CH

In fact – in this example – we won't use these symbols.

The first thing we need to do is to keep only the data rows – and to do that we code:

  INCLUDE COND=(1,7,CH,EQ,C'<member')

which throws away the first two rows and the last row.

Next we need to parse the data rows using the following INREC statement:

INREC IFTHEN=(WHEN=INIT,
        PARSE=(%1=(STARTAFT=C'<',ENDBEFR=C'/>',FIXLEN=80)),
        BUILD=(%1)),
      IFTHEN=(WHEN=INIT,
        PARSE=(%2=(ABSPOS=1,FIXLEN=18,STARTAFT=C'surname="',ENDBEFR=C'"')),
        BUILD=(%2,1,80)),
      IFTHEN=(WHEN=INIT,
        PARSE=(%3=(ABSPOS=1,FIXLEN=18,STARTAFT=C'firstname="',ENDBEFR=C'"')),
        BUILD=(1,16,%3,1,80)),
      IFTHEN=(WHEN=INIT,
        PARSE=(%4=(ABSPOS=1,FIXLEN=10,STARTAFT=C'job="',ENDBEFR=C'"')),
        BUILD=(1,32,%4))

which looks rather complicated.

This uses IFTHEN (introduced in 2004) and PARSE (introduced in 2006 with UK90006/UK90007).

In fact the IFTHEN clauses are a pipeline of stages. The WHEN=INIT conditions mean that they are performed for all records that pass the INCLUDE statement's condition. Each WHEN=INIT is performed in turn. And all the stages are performed within the INREC statement. That is, before any SORT, SUM, OUTREC or OUTFIL processing.

The first stage in this pipeline is

      IFTHEN=(WHEN=INIT,
        PARSE=(%1=(STARTAFT=C'<',ENDBEFR=C'/>',FIXLEN=80)),
        BUILD=(%1)),

which strips off the surrounding angle brackets from each line, producing an 80-byte record.

The second stage is

   IFTHEN=(WHEN=INIT,
        PARSE=(%2=(ABSPOS=1,FIXLEN=18,STARTAFT=C'surname="',ENDBEFR=C'"')),
        BUILD=(%2,1,80)),

which extracts the surname attribute into a field which is prepended onto the 80-byte record created in the first stage.

The third stage is

   IFTHEN=(WHEN=INIT,
        PARSE=(%3=(ABSPOS=1,FIXLEN=18,STARTAFT=C'firstname="',ENDBEFR=C'"')),
        BUILD=(1,16,%3,1,80)),

which extracts the firstname attribute into a field which is prepended onto the 80-byte record created in the first stage (but after the 16-byte (surname) field extracted in the second stage).

The fourth and final stage is

   IFTHEN=(WHEN=INIT,
        PARSE=(%4=(ABSPOS=1,FIXLEN=10,STARTAFT=C'job="',ENDBEFR=C'"')),
        BUILD=(1,32,%4))

which extracts the job attribute into a field which is appended onto the the 16-byte (surname) field extracted in the second stage and the 16-byte (firstname) field extracted in the third stage.

I admit this looks complicated but it does allow for the attributes to appear in any order in an XML element. What it doesn't do is to allow any old multiple-line format for the input XML. For that you really do need the toolkit. But I'm convinced there are tricks we can teach DFSORT when it comes to parsing XML. It's just that we'd need time to think about them. 🙂

A note on pipelining: When I first saw IFTHEN I thought of it potentially as a pipelining technique. This example is quite a good one for pipelining as everything happens in 4 IFTHEN WHEN=INIT stages. It's actually proved a lot simpler to construct the DFSORT processing this way – and it has isolated all the processing to the INREC statement. So there's lots you can do later on in the DFSORT invocation. (And the ability to allow the attributes (surname, firstname and job) to be in any order was made much easier by this pipelining approach.

But I have to be sanguine about pipelining: At this stage in DFSORT's development we don't have all the capabilities for branching etc that CMS (/TSO) Pipelines has. But I offer you the pipelining model as another way of thinking about what IFTHEN can do, as well as the "treat different records in different ways" original intention. However, we do have nice constructs like WHEN=ANY, WHEN=NONE and HIT=NEXT to construct reasonable pipelines with.

What has been really nice about recent DFSORT innovations is that you find out more things you can do with them every day.

PARSE is worthy of some more discussion: It's brand new (April 2006) and allows DFSORT to parse (duh!) variable-format data. In this case the length of each attribute is variable. %1, %2, %3 and %4 refer to different variable-length fields that we can use in subsequent processing stages. Let's take one usage as an example:

   IFTHEN=(WHEN=INIT,
        PARSE=(%1=(STARTAFT=C'<',ENDBEFR=C'/>',FIXLEN=80)),
        BUILD=(%1)),

In this case we extract into the variable %1 the first string in the input record that starts with < and ends with />, padding to 80 bytes with blanks. The BUILD=(%1) says that the output from this IFTHEN (stage) will be that entire 80-byte %1 variable. That is the input record but with the top and tail removed.

Note: We don't actually need to know how long the string we're extracting is. Prior to PARSE we would've had to know. And that, to me, is one of the very nice features of PARSE.

Generating XML Using DFSORT – Part II

(Originally posted 2006-04-24.)

In Generating XML Using DFSORT – Part I I described how to generate XML using some of DFSORT’s new features.

But the XML we generated was only one of the possible styles of XML: Most of the information was specified as attributes (with the values enclosed in quotes). Here’s a sample line:

<member surname="Mercury" firstname="Freddie" job="Singer" />

In this case the attribute job has the value Singer.

Another style uses elements rather than attributes. Here’s an example that conveys the same information (but in XML terms is not equivalent):

  <member>
    <surname>Mercury</surname>
    <firstname>Freddie</firstname>
    <job>Singer</job>
  </member>

The following example is similar to the one in User Guide for DFSORT PTFs UK90007 and UK90006 (SORTPEUG), but it builds on the example in Part I.

Starting with the same set of data

Mercury         Freddie         Singer   
May             Brian           Guitarist
Taylor          Roger           Drummer  
Deacon          John            Bassist  

and the same Symbols mapping it

Surname,*,16,CH  
Firstname,*,16,CH
Job,*,10,CH

We’d like to create an XML file:

<?xml version="1.0" encoding="UTF-8" ?>
<band>
  <member>
    <surname>Mercury</surname>
    <firstname>Freddie</firstname>
    <job>Singer</job>
  </member>
  <member>
    <surname>May</surname>
    <firstname>Brian</firstname>
    <job>Guitarist</job>
  </member>
  <member>
    <surname>Taylor</surname>
    <firstname>Roger</firstname>
    <job>Drummer</job>
  </member>
  <member>
    <surname>Deacon</surname>
    <firstname>John</firstname>
    <job>Bassist</job>
  </member>
</band>

The first step is to generate the data records (the ones bracketed by <member;> and </member>). Here is the OUTFIL statement that does this:

OUTFIL FNAMES=OUT1,REMOVECC,
  BUILD=(3:C'<member>',/,
         5:Surname,JFY=(LENGTH=35,
           SHIFT=LEFT,LEAD=C'<surname>',TRAIL=C'</surname>'),/,
         5:Firstname,JFY=(LENGTH=39,
           SHIFT=LEFT,LEAD=C'<firstname>',TRAIL=C'</firstname>'),/,
         5:Job,JFY=(LENGTH=27,
           SHIFT=LEFT,LEAD=C'<job>',TRAIL=C'</job>'),/,
         3:C'</member>')

In this case I didn’t actually use the Symbol for Position capability just introduced by UK90006 / UK90007, but I could have. But I did use the JFY function, also new with that level of DFSORT. Its purpose, just like in Part I, is to squeeze out trailing blanks from the fixed-width fields.

NOTE: The use of REMOVECC is important to avoid generating ANSI control characters, which would not be valid XML.

At this stage the output would be

  <member>
    <surname>Mercury</surname>
    <firstname>Freddie</firstname>
    <job>Singer</job>
  </member>
  <member>
    <surname>May</surname>
    <firstname>Brian</firstname>
    <job>Guitarist</job>
  </member>
  <member>
    <surname>Taylor</surname>
    <firstname>Roger</firstname>
    <job>Drummer</job>
  </member>
  <member>
    <surname>Deacon</surname>
    <firstname>John</firstname>
    <job>Bassist</job>
  </member>

To add the first 2 lines and the final line to this we code additional OUTFIL parameters:

  HEADER1=('<?xml version="1.0" encoding="UTF-8" ?>',/,'<band>'),
  TRAILER1=('</band>')

making the entire OUTFIL statement:

OUTFIL FNAMES=OUT1,REMOVECC,
  BUILD=(3:C'<member>',/,
         5:Surname,JFY=(LENGTH=35,
           SHIFT=LEFT,LEAD=C'<surname>',TRAIL=C'</surname>'),/,
         5:Firstname,JFY=(LENGTH=39,
           SHIFT=LEFT,LEAD=C'<firstname>',TRAIL=C'</firstname>'),/,
         5:Job,JFY=(LENGTH=27,
           SHIFT=LEFT,LEAD=C'<job>',TRAIL=C'</job>'),/,
         3:C'</member>'),
  HEADER1=('<?xml version="1.0" encoding="UTF-8" ?>',/,'<band>'),
  TRAILER1=('</band>')

You can use this statement with statements such as SORT, COPY, INCLUDE, OMIT, INREC and OUTREC, so long as the Symbols set you use reflect any reformatting of the records passed into OUTFIL.

In this entry and Part I I’ve shown you a couple of techniques for generating XML, which can be adapted in a mix and match fashion. In the next entry in this series I’ll start to describe how to parse XML.

Generating XML Using DFSORT – Part I

(Originally posted 2006-04-24.)

Following on from This entry on creating CSV files here’s the first of several entries on manipulating XML with DFSORT…

This is not intended to be a tutorial on XML but rather an exploration of how DFSORT can read (shred) and write (compose) XML. (I also use the terms ingest and emit synonymously.) Hopefully some of the basic concepts of XML will come across during the course of these blog entries.

XML is – to my mind – much better than CSV…

  • It IS a standard – and lots of things are built on this standard.
  • There is semantic and structuring information built into XML.
  • Support for reading and writing XML are built into many programming languages and other programs, such as DB2.

So XML is a great way of structuring information for exchange between programs and systems.

So we’ll start with a simple set of data we’d like to convert to XML…

Consider the small data set

Mercury         Freddie         Singer   
May             Brian           Guitarist
Taylor          Roger           Drummer  
Deacon          John            Bassist

This data is mapped with the following DFSORT Symbols deck:

Surname,*,16,CH
Firstname,*,16,CH
Job,*,10,CH

We’d like to create an XML file:

<?xml version="1.0" encoding="UTF-8" ?>
<band>
	<member surname="Mercury" firstname="Freddie" job="Singer" />
	<member surname="May" firstname="Brian" job="Guitarist" />
	<member surname="Taylor" firstname="Roger" job="Drummer" />
	<member surname="Deacon" firstname="John" job="Bassist" />
</band>>

The first thing we do is to take the input records and wrap them with quotes, squeezing out the trailing spaces:

<member surname="Mercury"          firstname="Freddie"          job="Singer"    />
<member surname="May"              firstname="Brian"            job="Guitarist" />
<member surname="Taylor"           firstname="Roger"            job="Drummer"   />
<member surname="Deacon"           firstname="John"             job="Bassist"   />

To do this we can use the following INREC statement:

INREC BUILD=(C'<member',
    C'surname=',Surname,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=18),X,
    C'firstname=',Firstname,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=18),X,
    C'job=',Job,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=12),
    C'/>')

The above statement reformats the three fields – removing the trailing spaces – in the manner described in my CSV creation blog entry.

The next (OUTREC) statement squeezes out the spaces, preserving those within the quotes:

   OUTREC BUILD=(_Unsqueezed,SQZ=(SHIFT=LEFT,PAIR=QUOTE,MID=C' '))

To make the above code work I defined an additional symbol

_Unsqueezed,1,82,CH

This symbol maps the whole of the output of the INREC statement – that is the records wrapped in quotes.

The final step is an OUTFIL statement that tops and tails the output data with some more tags:

OUTFIL FNAMES=OUT1,REMOVECC,
    HEADER1=('<?xml version="1.0" encoding="UTF-8" ?>',/,'<band>'),
    TRAILER1=('</band>')

NOTE: REMOVECC removes the ANSI carriage control bytes.

In this example I’ve spread the transformations across INREC, OUTREC and OUTFIL statements. You might want to place the function differently: You might, for instance, want to do a SORT or include only certain records upstream of the XML creation. Perhaps in a later blog entry I’ll talk about that some more. But there are other things I want to talk about regarding DFSORT and XML. Stay tuned for a different style of XML.

Using DFSORT to create CSV files

(Originally posted 2006-04-21.)

I plan on writing some entries on creating and parsing XML with DFSORT (using the UK90006 / UK90007 functional enhancments that DFSORT Development recently announced). But here’s a limbering up example – creating a CSV file from regular sequential file input.

CSV files (Comma-Separated Value (or Variable if you prefer)) are of the form

"JDLFJDJ DF",4146,"FKJFK"
"JDJDJ JKJJ",12352,"EE FF"
"AAFIELD3FI",4,"94949"
"ACFIELD",35,"34443"

where the commas separate fields, and where the quotes denote their contents are character strings. Each line is a separate row of fields. So it’s really a grid. This is an early form at structuring data as text, and it’s used by many programs such as spreadsheets. It’s not a terribly robust format and probably isn’t a standard. Further, there is no real attempt to define the meaning of the fields.

But it does illustrate a use for the new JFY and SQZ capabilities of DFSORT…

The source data for this example is

JDLFJDJ DF    FKJFK
JDJDJ JKJJ    EE FF
AAFIELD3FI    94949
ACFIELD       34443

where the blanks in the middle are actually a 4-byte binary number.

The DFSORT control statements are…

OPTION COPY
INREC BUILD=(STR1,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=12),X,
             NUM1,EDIT=(IIIIIIIT),X,
             STR2,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=7))
OUTREC BUILD=(PRINTED,SQZ=(SHIFT=LEFT,PAIR=QUOTE,MID=C','))

You’ll notice the widespread use of Symbols, which isn’t a new thing. So here’s the Symbols deck:

//SYMNAMES DD *
POSITION,1           
STR1,*,10,CH         
NUM1,*,4,BI          
STR2,*,8,CH          
*
PRINTED,1,29,CH      

The first four symbols map the input record. The fifth one (PRINTED) maps the intermediate record that results from the INREC.

The INREC statement produces (with the sample data) the following intermediate records:

"JDLFJDJ DF"     4146 "FKJFK"
"JDJDJ JKJJ"    12352 "EE FF"
"AAFIELD3FI"        4 "94949"
"ACFIELD"          35 "34443"

So the strings are wrapped in quotes but there are no commas and there has been no squeezing together.

To take the first field

STR1,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=12)

shifts the data to the left, puts quotes around the string (removing trailing blanks) and makes the resulting field 12 bytes wide. (The second field involves number formatting and the third is similar to the first but with a length of 7 bytes, including the quotes.)

The OUTREC statement squeezes out all the spaces outside of the quotes (PAIR=QUOTE telling DFSORT to preserve what’s in the pair of quotes.) MID=C’,’ specifies that any run of spaces (outside of pairs of quotes) are to be replaced by a single comma.

This is, admittedly, a fairly complex example. But I hope it shows some of the capabilities of SQZ and JFY. And maybe this is a sample you can swipe and modify for your applications.

One thing that isn’t clear to me is whether trailing blanks are in fact significant in the CSV file format. Because it’s scarcely a standard it’s probably implementation-dependent. But, personally, I’d assume that blanks were significant.

Removing variable numbers of blanks could be done prior to these new functions being available but it was much more fiddly. I wouldn’t want to even attempt explaining that one. 🙂

And shortly I’ll write some tips about XML and DFSORT.

Search and Replace with the new DFSORT functional PTFs

(Originally posted 2006-04-20.)

Here’s my first tip with the new UK90006 / UK90007 DFSORT PTFs. It’s an apparently minor enhancement that I think people will like…

  • It leads to a more elegant syntactical expression.
  • It is more maintainable – in the face of changing record layouts.

Suppose you want to search a file for a string in a particular position in each record and replace all matching instances with a different string. (I might do that to scrub an SMFID in SMF records to produce sample reports.) Without DFSORT’s Symbols support (which is ancient history now) you might have to code something like…

OUTREC IFTHEN=(WHEN=(4,5,CH,EQ,C'84484'),OVERLAY=(4:C'ABCDE'))

In this example the field you are searching in is a 5-character string beginning in position 4. Hence 4 appears twice in the statement.

NOTE: I’ve already used two functions that are much more recent than the introduction of Symbols: IFTHEN and OVERLAY…

  • IFTHEN applies a BUILD (also know as FIELDs) or an OVERLAY clause to only the records that meet the IFTHEN condition. In this case
    4,5,CH,EQ,C'84484'
    specifies which field to search in.
  • OVERLAY changes only the specified fields. In this case
    4:C'ABCDE'
    overlays just the 4-byte field we previously searched in. Prior to OVERLAY you had to explicitly specify all the fields you wanted to appear in the output record, even if only one was changed – as is the case here.

Now, that works pretty well – because of these two new functions introduced with PTFs UQ95213 / UQ95214 at the end of 2004. But suppose the format of the record changes. You would have to recode all such utterances for the new field positions. It would be much better to do it in one place. And that, in a nutshell, is the case for Symbols…

Continuing with our example, you might map the record with a SYMNAMES DD that points to the following definitions…

F1,*,1,CHF2,*,1,CHSKIP,1   F3,*,5,CH

If you work through these definitions you’ll realise that F3 starts at position 4 and is of length 5 bytes. So it matches our original example. Prior to UK90006 / UK90007 you could code:

OUTREC IFTHEN=(WHEN=(F3,EQ,C'84484'),OVERLAY=(4:C'ABCDE'))
and the IFTHEN clause is fully symbolic (ignoring the literal comparison string – which could itself be converted to a symbol.)

But suppose the data was reorganised so that this field moved. You’ll notice that still requires you to change the OVERLAY clause to replace the 4 with a different position.

With UK90006 / UK90007 you can now code

OUTREC IFTHEN=(WHEN=(F3,EQ,C'84484'),OVERLAY=(F3:C'ABCDE'))

and a reorganisation of the data just means changing the Symbols deck pointed to by the SYMNAMES DD. If you’re sensible this will be in one permanent file and so it’s just one edit (plus, of course, some testing).

What has changed is the ability to specify a position (for example for OVERLAY) using a symbol.

So, I think I’ve made the maintainability point. I think this enhancement also allows more elegant expression: You now refer to the field consistently by its symbolic name, rather than in two different ways in two different places.

As I said, it’s a minor enhancement. But one I particularly like – as I mess around with DFSORT a lot and little annoys me more than clunkiness of expression and poor maintainability.

One final point: I mentioned above that you could replace the literal string values with symbols. Here’s how to do it (in this example). Add two symbol definitions to the SYMNAMES DD:

OLDVALUE,C'84484'NEWVALUE,C'ABCDE'

and replace the statement with

OUTREC IFTHEN=(WHEN=(F3,EQ,OLDVALUE),OVERLAY=(F3:NEWVALUE))

and now you can change the search and replacement string values without changing all the occurrences of these strings in all your code – just in the one place.

DFSORT New Function PTFs

(Originally posted 2006-04-20.)

As many of you know I like to beta DFSORT function PTFs. About once a year a new set of functions appears, many in direct response to customer requirements. Occasionally some of the enhancements are at my suggestion. 🙂

Yesterday PTFs UK90006 and UK90007 became available. (Being UK based it’s nice to see PTF numbers beginning with UK but I read no significance into this.) 🙂

You can read the user guide here.

I’m not going to duplicate what that document says here. But the enhancements are rather pleasing. What I will do, occasionally, is post examples of things that are easy to overlook in this announcement. As someone who’s played with the code over the past six months I think I’m well placed to do this.

And I expect Frank Yaeger (the DFSORT function developer) to be all over the web with this stuff shortly.

My New Job

(Originally posted 2006-04-07.)

As of today I have a new job.

  • I still work for IBM.
  • I’m still UK based.
  • I still do performance analysis with customers in the same way.

But some things have changed:

  • I now work in Software Group, rather than what’s now called Global Technology Services.
  • I now have a geographical territory that stretches from Rejkavik to Vladivostok, and from Stavanger to Cape Town. I would quite like to visit all of those. 🙂 Actually there’s a hole in the middle that isn’t mine. 😦 But then IBM organisation was always a little unfathomable. 🙂

So, in practical terms very little changes for those customers who already know me. In fact I might be able to be a little more responsive – in case you’re planning any unforeseen crises. 🙂 And I hope to be meeting many more (really interesting) customers in more far-flung places. I’m already told to expect to be able to do more presenting, writing and looking after user groups. So I think that’s all goodness.

So, what’s not to like? 🙂