Mainframe, Performance, Topics

Better Late Then Never – RMPTTOM

(Originally posted 2006-11-08.)

Well, I don’t know about that. Maybe this contribution to the debate won’t add much. But there must be at least one performance person out there who hasn’t heard of the discussion around RMPTTOM. But would I want such a person amongst my readership? 🙂

I’m not going to rehearse the whole story. It is pretty well understood that the old default was creaking somewhat.What I am pleased by is the way it happened…

Sitting on the sidelines of this one I saw.

A customer reported experimenting with the RMPTTOM value and getting good results on MXG-L.
Other customers experimented and also got good results.
Bernie Pierce of SRM / WLM Development wrote a very fine piece on the matter in MXG-L. And I agree with everything he said in it.
A WSC Flash on the subject, distilling Bernie’s thoughts.
z/OS changes to the default.

You might argue that IBM should’ve been proactive in this matter. Personally, as I wasn’t involved, I can only guess that SRM / WLM developers had a hunch that the default value of RMPTTOM would benefit from revisiting. And that a customer coming along and providing proof was the catalyst. That’s genuinely a pure guess. In any case we do have to rely on input from the field and customers – to some degree. That’s what makes things like conference conversations and newsgroups (such as MXG-L) so much fun. 🙂

Of course, if you still want the technical guts do post a comment to that effect. But wait a while as I bone up on the issue. 🙂

2 More DB2 Access Path APARS

(Originally posted 2006-05-22.)

Here are a couple more SQL processing APARs.

APAR PK22814 for Version 8 ESTIMATED SUBQUERY COST IS TOO HIGH WITH SORT MERGE JOIN.

The APAR text says Correlated subquery cost is over-estimated for sort merge join. A well-performing sort merge join plan might not be chosen by DB2. Correlated subquery cost can be overestimated for sort merge join. This may cause a sort merge join plan that performs well to be overlooked by DB2. The cost calculations for sort merge join with a correlated subquery are adjusted to produce a more accurate estimate. Good sort merge join plans can now be chosen for better query performance.

APAR PK23495 for Version 8 SOME PREDICATES ARE NOT BEING PROPERLY PUSHED DOWN INTO VIEWS.

From the APAR text: Slow access path is chosen for an SQL statement referencing a view or a table expression that contains an outer join, because the predicates are not being properly pushed down.

The bind time code in DB2 is modified to correct the incomplete predicate push down.

DB2 Virtual Storage – The Journey Continues

(Originally posted 2006-05-15.)

DB2 Version 8 introduced some very significant exploitation of 64-bit virtual storage. It would be naive to believe that was a total conversion to 64-bit and indeed installations’ mileage varies, depending on circumstances. Here are two APARs that show the journey continues…

PK21268 MOVE CURRENT PATH STORAGE OUT OF STACK AND INTO ABOVE BAR

From the APAR Comments: The working storage to hold CURRENT PATH information has been moved to above-the-bar storage. This will reduce below-the-bar storage consumption.

The word stack is significant as it is one area where the conversion to 64-bit was said to increase storage requirements below the bar. You can monitor Stack Storage using IFCID 225 DB2 Statistics Trace records.

PK21237 VIRTUAL STORAGE CONSTRAINT RELIEF IN BUFFER MANAGEMENT

This provides relief in three areas, according to the APAR description…

The buffer manager engines have been modified to use a single common above-the-bar storage pool, rather than having a separate pool for each engine.

Additionally, the default maximum for the number of engines has been reduced.

For both castout engines and P-lock engines, excess stack storage will now be released before suspending awaiting more work to do.

ASCII to EBCDIC Conversion with DFSORT

(Originally posted 2006-05-13.)

(Scroll down if you’re not interested in DB2)

This past week I’ve been updating our SQL Analysis code to cope with Unicode SQL statements. (If you bind a package or plan once you’ve gone to DB2 Version 8 New Function Mode the statement text is stored in the Catalog in Unicode.) These statements are in Unicode UTF-8 which is extremely similar to ASCII. So I snaffled some existing code of mine that converts from ASCII to EBCDIC and pressed it into service…

If the first 10 bytes of the statement text (minus the 2 4-byte header fields) contain X’40’ (an EBCDIC null (space)) then I consider the text to be in EBCDIC. Otherwise I deem it to be ASCII – and do the conversion. (You might ask “what about statements like ROLLBACK?” Well, it turns out that in SYSPACKSTMT there is always a trailing blank on the statement text – so that’s alright. 🙂

(In fact my code decodes all the structural information in the SYSPACKSTMT STMT column – including extracting information on host variables.)

Now, my colleagues sometimes just want to take a look at the statement text sometimes – without bothering to run my code. So here’s what I learned when creating a DFSORT job I set up for them to run, converting the statement text from ASCII to EBCDIC.

Converting ASCII to EBCDIC With DFSORT

First, a disclaimer: Certain codepoints might need slight adjustment in some countries – for example if you have dollar versus pound sterling issues.

First you code an ALTSEQ statement:

  ALTSEQ CODE=(0000,0101,0202,0303,0437,052D,062E,072F,
    0816,0905,0A25,0B0B,0C0C,0D0D,0E0E,0F0F,
    1010,1111,1212,1313,143C,153D,1632,1726,
    1818,1919,1A3F,1B27,1C22,1D1D,1E35,1F1F,
    2040,215A,227F,237B,245B,256C,2650,277D,
    284D,295D,2A5C,2B4E,2C6B,2D60,2E4B,2F61,
    30F0,31F1,32F2,33F3,34F4,35F5,36F6,37F7,
    38F8,39F9,3A7A,3B5E,3C4C,3D7E,3E6E,3F6F,
    407C,41C1,42C2,43C3,44C4,45C5,46C6,47C7,
    48C8,49C9,4AD1,4BD2,4CD3,4DD4,4ED5,4FD6,
    50D7,51D8,52D9,53E2,54E3,55E4,56E5,57E6,
    58E7,59E8,5AE9,5BAD,5CE0,5DBD,5E5F,5F6D,
    6079,6181,6282,6383,6484,6585,6686,6787,
    6888,6989,6A91,6B92,6C93,6D94,6E95,6F96,
    7097,7198,7299,73A2,74A3,75A4,76A5,77A6,
    78A7,79A8,7AA9,7BC0,7C4F,7DD0,7EA1,7F07,
    8043,8120,8221,831C,8423,85EB,8624,879B,
    8871,8928,8A38,8B49,8C90,8DBA,8EEC,8FDF,
    9045,9129,922A,939D,9472,952B,968A,979A,
    9867,9956,9A64,9B4A,9C53,9D68,9E59,9F46,
    A0EA,A1DA,A22C,A3DE,A48B,A555,A641,A7FE,
    A858,A951,AA52,AB48,AC69,ADDB,AE8E,AF8D,
    B073,B174,B275,B3FA,B415,B5B0,B6B1,B7B3,
    B8B4,B9B5,BA6A,BBB7,BCB8,BDB9,BECC,BFBC,
    C0AB,C13E,C23B,C30A,C4BF,C58F,C63A,C714,
    C8A0,C917,CACB,CBCA,CC1A,CD1B,CE9C,CF04,
    D034,D1EF,D21E,D306,D408,D509,D677,D770,
    D8BE,D9BB,DAAC,DB54,DC63,DD65,DE66,DF62,
    E030,E142,E247,E357,E4EE,E533,E6B6,E7E1,
    E8CD,E9ED,EA36,EB44,ECCE,EDCF,EE31,EFAA,
    F0FC,F19E,F2AE,F38C,F4DD,F5DC,F639,F7FB,
    F880,F9AF,FAFD,FB78,FC76,FDB2,FE9F,FFFF)

Then you refer to this translation table in your control statements:

  INREC FIELDS=(STMT,TRAN=ALTSEQ)

In this case the field STMT (which happens to be a DFSORT Symbol name) is translated using the ALTSEQ symbol table. (There is one translation table but it can be used by selected fields – those for which TRAN=ALTSEQ is specified.)

TRAN=ALTSEQ has been supported by DFSORT for a number of years. As documented in the Reformatting: Translation section of Beyond Sorting also mentions TRAN=UTOL and TRAN=LTOU, which allow translation between lower and upper case.

In our code I have placed the ALTSEQ translation table in a data set pointed to by a (separate) DFSPARM DD. That makes it easier to manage. Indeed, if you do have code page issues you can address them using several such data sets, each with their own customisation for “local difficulties”.:-)

One final note: In Production I’ll probably use IFTHEN to automate the “blank in the first 10 bytes” test I mentioned above – as you can, with IFTHEN (UQ95213 / UQ95214), reformat records in different ways dependent on the input data. That includes choosing whether to use TRAN=ALTSEQ on individual fields.

Unknown Unknowns

(Originally posted 2006-05-06.)

Nope, I’m not going to do a Rumsfeld. 🙂

But there are a couple of things I didn’t know (I didn’t know) about DFSORT. Relating them to you is probably OK because

It’s not something I’m embarrassed about.:-)
If you want to understand some of the funkier new DFSORT tricks you probably need to know these things.

Consider these DFSORT examples (which I take no credit for). They are used to deal with groups of records and rely on features introduced about a year ago (with New Function APAR UQ95214 /UQ95213).

Here are the two things you need to know if you are ever to understand these examples…

OVERLAY items are applied in sequence

By this I mean that the first field specified is overlain first, then the second, then the third, etc… Now, the significance of this (and the reason for the bold emphasis) is that the result of the first overlaying can be used in the second overlaying. Consider the following fragment from the first example (here):

  IFTHEN=(WHEN=NONE,
    OVERLAY=(152:SEQNUM,8,ZD,
       144:144,8,ZD,SUB,152,8,ZD,M11,LENGTH=8))

Here’s how we process this instruction (ignoring the WHEN=NONE IFTHEN piece):

We overlay positions 152-159 with a DFSORT-generated record sequence number (and more on that later in this entry).
We overlay positions 144-151 with the current value in 144-151 minus the just-generated value in 152-159 (not whatever was in 152-159 before we started OVERLAYing).

And this turns out to be critical to the way this smart trick works.

Every IFTHEN stage that uses SEQNUM has it’s own separately-incremented SEQNUM counter

Consider the whole IFTHEN pipeline in the smart trick:

   INREC IFTHEN=(WHEN=INIT,OVERLAY=(144:SEQNUM,8,ZD)),                   1
         IFTHEN=(WHEN=(2,4,CH,EQ,C'RPT.'),
                 OVERLAY=(134:2,10,144:SEQNUM,8,ZD)),                    2
         IFTHEN=(WHEN=NONE,
                 OVERLAY=(152:SEQNUM,8,ZD,                               3
                   144:144,8,ZD,SUB,152,8,ZD,M11,LENGTH=8))

Each of these SEQNUMs, as I say, is a separately-incremented counter: If the WHEN condition is satisfied the OVERLAY (or BUILD) processing is performed on the record. And each SEQNUM counter is incremented…

This one is applied to every record so is the true count of the records that INREC processed.
This one is incremented only if (2,4,CH,EQ,C’RPT.’) evaluates to true.
This one is incremented if no other IFTHEN condition (other than WHEN=INIT conditions) is satisfied, that is if (2,4,CH,EQ,C’RPT.’) evaluates to false.

Here are some other things you might not know about SEQNUM…

For a long time now you’ve been able to specify a starting value and an increment. e.g.
```
  OUTREC BUILD=( ... ,SEQNUM,8,ZD,START=5,INCR=5, ... )
```
creates a sequence number that starts at 5 and goes up by 5 each time it’s incremented (i.e for each record it’s applied to.
With UQ92514 / UQ92513 you can specify a RESTART subparameter. e.g.
```
  INREC OVERLAY=(1:SEQNUM,8,ZD,RESTART=(16,8))
```
restarts the sequence number at 1 each time the value in positions 16-23 changes.
You can sort on a sequence number – to preserve the original sequence of records, perhaps. e.g.
```
  INREC FIELDS=(SEQNUM,8,PD, ... )SORT  FIELDS=(1,8,PD,A)
```
You can perform arithmetic on a SEQNUM field – assuming you have the arithmetic expression support built into APAR UQ90053. Now, I will take credit for suggesting that MOD be supported so that sequence numbers could be written 1, 2, 3, 1, 2, 3, … (That might allow you to subsequently separate neigbouring records by sorting on this MODed sequence number, for example.) e.g.
```
  INREC FIELDS=(SEQNUM,8,PD,MOD,+3, ... )SORT  FIELDS=(1,8,PD,A)
```
Not only does each stage in an IFTHEN pipeline have its own SEQNUM counter but INREC, OUTREC and each OUTFIL statement all have their counter (or, for IFTHEN, set of counters).

So, lots more flexibility with sequence numbers than you might imagine.

From the “Made Me Chuckle Anyway” Department…

(Originally posted 2006-05-02.)

Saw an APAR described as INCORRECT OUPUT 🙂 the other day.

That reminds me of a time long ago when a friend had a foil titled Qualtiy.:-)

Parsing XML with DFSORT

(Originally posted 2006-04-25.)

Following on from Generating XML Using DFSORT – Part II here are some thoughts on how to parse XML with DFSORT.

NOTE: For more complex XML than this entry describes you probably want to use the XML Toolkit for z/OS. This provides C++ and Java parsers for XML and a stand-alone XSLT processor.

In this example I'll show you how to take XML that looks like this and create a flat file from it:

<?xml version="1.0" encoding="UTF-8" ?>
<band>
<member surname="Mercury" firstname="Freddie" job="Singer" />
<member surname="May" firstname="Brian" job="Guitarist" />
<member surname="Taylor" firstname="Roger" job="Drummer" />
<member surname="Deacon" firstname="John" job="Bassist" />
</band>

and turn it into our (now familiar)

Mercury         Freddie         Singer   
May             Brian           Guitarist
Taylor          Roger           Drummer  
Deacon          John            Bassist

which could be mapped with DFSORT Symbols:

Surname,*,16,CH  
Firstname,*,16,CH
Job,*,10,CH

In fact – in this example – we won't use these symbols.

The first thing we need to do is to keep only the data rows – and to do that we code:

  INCLUDE COND=(1,7,CH,EQ,C'<member')

which throws away the first two rows and the last row.

Next we need to parse the data rows using the following INREC statement:

INREC IFTHEN=(WHEN=INIT,
        PARSE=(%1=(STARTAFT=C'<',ENDBEFR=C'/>',FIXLEN=80)),
        BUILD=(%1)),
      IFTHEN=(WHEN=INIT,
        PARSE=(%2=(ABSPOS=1,FIXLEN=18,STARTAFT=C'surname="',ENDBEFR=C'"')),
        BUILD=(%2,1,80)),
      IFTHEN=(WHEN=INIT,
        PARSE=(%3=(ABSPOS=1,FIXLEN=18,STARTAFT=C'firstname="',ENDBEFR=C'"')),
        BUILD=(1,16,%3,1,80)),
      IFTHEN=(WHEN=INIT,
        PARSE=(%4=(ABSPOS=1,FIXLEN=10,STARTAFT=C'job="',ENDBEFR=C'"')),
        BUILD=(1,32,%4))

which looks rather complicated.

This uses IFTHEN (introduced in 2004) and PARSE (introduced in 2006 with UK90006/UK90007).

In fact the IFTHEN clauses are a pipeline of stages. The WHEN=INIT conditions mean that they are performed for all records that pass the INCLUDE statement's condition. Each WHEN=INIT is performed in turn. And all the stages are performed within the INREC statement. That is, before any SORT, SUM, OUTREC or OUTFIL processing.

The first stage in this pipeline is

      IFTHEN=(WHEN=INIT,
        PARSE=(%1=(STARTAFT=C'<',ENDBEFR=C'/>',FIXLEN=80)),
        BUILD=(%1)),

which strips off the surrounding angle brackets from each line, producing an 80-byte record.

The second stage is

   IFTHEN=(WHEN=INIT,
        PARSE=(%2=(ABSPOS=1,FIXLEN=18,STARTAFT=C'surname="',ENDBEFR=C'"')),
        BUILD=(%2,1,80)),

which extracts the surname attribute into a field which is prepended onto the 80-byte record created in the first stage.

The third stage is

   IFTHEN=(WHEN=INIT,
        PARSE=(%3=(ABSPOS=1,FIXLEN=18,STARTAFT=C'firstname="',ENDBEFR=C'"')),
        BUILD=(1,16,%3,1,80)),

which extracts the firstname attribute into a field which is prepended onto the 80-byte record created in the first stage (but after the 16-byte (surname) field extracted in the second stage).

The fourth and final stage is

   IFTHEN=(WHEN=INIT,
        PARSE=(%4=(ABSPOS=1,FIXLEN=10,STARTAFT=C'job="',ENDBEFR=C'"')),
        BUILD=(1,32,%4))

which extracts the job attribute into a field which is appended onto the the 16-byte (surname) field extracted in the second stage and the 16-byte (firstname) field extracted in the third stage.

I admit this looks complicated but it does allow for the attributes to appear in any order in an XML element. What it doesn't do is to allow any old multiple-line format for the input XML. For that you really do need the toolkit. But I'm convinced there are tricks we can teach DFSORT when it comes to parsing XML. It's just that we'd need time to think about them. 🙂

A note on pipelining: When I first saw IFTHEN I thought of it potentially as a pipelining technique. This example is quite a good one for pipelining as everything happens in 4 IFTHEN WHEN=INIT stages. It's actually proved a lot simpler to construct the DFSORT processing this way – and it has isolated all the processing to the INREC statement. So there's lots you can do later on in the DFSORT invocation. (And the ability to allow the attributes (surname, firstname and job) to be in any order was made much easier by this pipelining approach.

But I have to be sanguine about pipelining: At this stage in DFSORT's development we don't have all the capabilities for branching etc that CMS (/TSO) Pipelines has. But I offer you the pipelining model as another way of thinking about what IFTHEN can do, as well as the "treat different records in different ways" original intention. However, we do have nice constructs like WHEN=ANY, WHEN=NONE and HIT=NEXT to construct reasonable pipelines with.

What has been really nice about recent DFSORT innovations is that you find out more things you can do with them every day.

PARSE is worthy of some more discussion: It's brand new (April 2006) and allows DFSORT to parse (duh!) variable-format data. In this case the length of each attribute is variable. %1, %2, %3 and %4 refer to different variable-length fields that we can use in subsequent processing stages. Let's take one usage as an example:

   IFTHEN=(WHEN=INIT,
        PARSE=(%1=(STARTAFT=C'<',ENDBEFR=C'/>',FIXLEN=80)),
        BUILD=(%1)),

In this case we extract into the variable %1 the first string in the input record that starts with < and ends with />, padding to 80 bytes with blanks. The BUILD=(%1) says that the output from this IFTHEN (stage) will be that entire 80-byte %1 variable. That is the input record but with the top and tail removed.

Note: We don't actually need to know how long the string we're extracting is. Prior to PARSE we would've had to know. And that, to me, is one of the very nice features of PARSE.

Generating XML Using DFSORT – Part II

(Originally posted 2006-04-24.)

In Generating XML Using DFSORT – Part I I described how to generate XML using some of DFSORT’s new features.

But the XML we generated was only one of the possible styles of XML: Most of the information was specified as attributes (with the values enclosed in quotes). Here’s a sample line:

<member surname="Mercury" firstname="Freddie" job="Singer" />

In this case the attribute job has the value Singer.

Another style uses elements rather than attributes. Here’s an example that conveys the same information (but in XML terms is not equivalent):

  <member>
    <surname>Mercury</surname>
    <firstname>Freddie</firstname>
    <job>Singer</job>
  </member>

The following example is similar to the one in User Guide for DFSORT PTFs UK90007 and UK90006 (SORTPEUG), but it builds on the example in Part I.

Starting with the same set of data

Mercury         Freddie         Singer   
May             Brian           Guitarist
Taylor          Roger           Drummer  
Deacon          John            Bassist

and the same Symbols mapping it

Surname,*,16,CH  
Firstname,*,16,CH
Job,*,10,CH

We’d like to create an XML file:

<?xml version="1.0" encoding="UTF-8" ?>
<band>
  <member>
    <surname>Mercury</surname>
    <firstname>Freddie</firstname>
    <job>Singer</job>
  </member>
  <member>
    <surname>May</surname>
    <firstname>Brian</firstname>
    <job>Guitarist</job>
  </member>
  <member>
    <surname>Taylor</surname>
    <firstname>Roger</firstname>
    <job>Drummer</job>
  </member>
  <member>
    <surname>Deacon</surname>
    <firstname>John</firstname>
    <job>Bassist</job>
  </member>
</band>

The first step is to generate the data records (the ones bracketed by <member;> and </member>). Here is the OUTFIL statement that does this:

OUTFIL FNAMES=OUT1,REMOVECC,
  BUILD=(3:C'<member>',/,
         5:Surname,JFY=(LENGTH=35,
           SHIFT=LEFT,LEAD=C'<surname>',TRAIL=C'</surname>'),/,
         5:Firstname,JFY=(LENGTH=39,
           SHIFT=LEFT,LEAD=C'<firstname>',TRAIL=C'</firstname>'),/,
         5:Job,JFY=(LENGTH=27,
           SHIFT=LEFT,LEAD=C'<job>',TRAIL=C'</job>'),/,
         3:C'</member>')

In this case I didn’t actually use the Symbol for Position capability just introduced by UK90006 / UK90007, but I could have. But I did use the JFY function, also new with that level of DFSORT. Its purpose, just like in Part I, is to squeeze out trailing blanks from the fixed-width fields.

NOTE: The use of REMOVECC is important to avoid generating ANSI control characters, which would not be valid XML.

At this stage the output would be

  <member>
    <surname>Mercury</surname>
    <firstname>Freddie</firstname>
    <job>Singer</job>
  </member>
  <member>
    <surname>May</surname>
    <firstname>Brian</firstname>
    <job>Guitarist</job>
  </member>
  <member>
    <surname>Taylor</surname>
    <firstname>Roger</firstname>
    <job>Drummer</job>
  </member>
  <member>
    <surname>Deacon</surname>
    <firstname>John</firstname>
    <job>Bassist</job>
  </member>

To add the first 2 lines and the final line to this we code additional OUTFIL parameters:

  HEADER1=('<?xml version="1.0" encoding="UTF-8" ?>',/,'<band>'),
  TRAILER1=('</band>')

making the entire OUTFIL statement:

OUTFIL FNAMES=OUT1,REMOVECC,
  BUILD=(3:C'<member>',/,
         5:Surname,JFY=(LENGTH=35,
           SHIFT=LEFT,LEAD=C'<surname>',TRAIL=C'</surname>'),/,
         5:Firstname,JFY=(LENGTH=39,
           SHIFT=LEFT,LEAD=C'<firstname>',TRAIL=C'</firstname>'),/,
         5:Job,JFY=(LENGTH=27,
           SHIFT=LEFT,LEAD=C'<job>',TRAIL=C'</job>'),/,
         3:C'</member>'),
  HEADER1=('<?xml version="1.0" encoding="UTF-8" ?>',/,'<band>'),
  TRAILER1=('</band>')

You can use this statement with statements such as SORT, COPY, INCLUDE, OMIT, INREC and OUTREC, so long as the Symbols set you use reflect any reformatting of the records passed into OUTFIL.

In this entry and Part I I’ve shown you a couple of techniques for generating XML, which can be adapted in a mix and match fashion. In the next entry in this series I’ll start to describe how to parse XML.

Generating XML Using DFSORT – Part I

(Originally posted 2006-04-24.)

Following on from This entry on creating CSV files here’s the first of several entries on manipulating XML with DFSORT…

This is not intended to be a tutorial on XML but rather an exploration of how DFSORT can read (shred) and write (compose) XML. (I also use the terms ingest and emit synonymously.) Hopefully some of the basic concepts of XML will come across during the course of these blog entries.

XML is – to my mind – much better than CSV…

It IS a standard – and lots of things are built on this standard.
There is semantic and structuring information built into XML.
Support for reading and writing XML are built into many programming languages and other programs, such as DB2.

So XML is a great way of structuring information for exchange between programs and systems.

So we’ll start with a simple set of data we’d like to convert to XML…

Consider the small data set

Mercury         Freddie         Singer   
May             Brian           Guitarist
Taylor          Roger           Drummer  
Deacon          John            Bassist

This data is mapped with the following DFSORT Symbols deck:

Surname,*,16,CH
Firstname,*,16,CH
Job,*,10,CH

We’d like to create an XML file:

<?xml version="1.0" encoding="UTF-8" ?>
<band>
	<member surname="Mercury" firstname="Freddie" job="Singer" />
	<member surname="May" firstname="Brian" job="Guitarist" />
	<member surname="Taylor" firstname="Roger" job="Drummer" />
	<member surname="Deacon" firstname="John" job="Bassist" />
</band>>

The first thing we do is to take the input records and wrap them with quotes, squeezing out the trailing spaces:

<member surname="Mercury"          firstname="Freddie"          job="Singer"    />
<member surname="May"              firstname="Brian"            job="Guitarist" />
<member surname="Taylor"           firstname="Roger"            job="Drummer"   />
<member surname="Deacon"           firstname="John"             job="Bassist"   />

To do this we can use the following INREC statement:

INREC BUILD=(C'<member',
    C'surname=',Surname,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=18),X,
    C'firstname=',Firstname,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=18),X,
    C'job=',Job,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=12),
    C'/>')

The above statement reformats the three fields – removing the trailing spaces – in the manner described in my CSV creation blog entry.

The next (OUTREC) statement squeezes out the spaces, preserving those within the quotes:

   OUTREC BUILD=(_Unsqueezed,SQZ=(SHIFT=LEFT,PAIR=QUOTE,MID=C' '))

To make the above code work I defined an additional symbol

_Unsqueezed,1,82,CH

This symbol maps the whole of the output of the INREC statement – that is the records wrapped in quotes.

The final step is an OUTFIL statement that tops and tails the output data with some more tags:

OUTFIL FNAMES=OUT1,REMOVECC,
    HEADER1=('<?xml version="1.0" encoding="UTF-8" ?>',/,'<band>'),
    TRAILER1=('</band>')

NOTE: REMOVECC removes the ANSI carriage control bytes.

In this example I’ve spread the transformations across INREC, OUTREC and OUTFIL statements. You might want to place the function differently: You might, for instance, want to do a SORT or include only certain records upstream of the XML creation. Perhaps in a later blog entry I’ll talk about that some more. But there are other things I want to talk about regarding DFSORT and XML. Stay tuned for a different style of XML.

Using DFSORT to create CSV files

(Originally posted 2006-04-21.)

I plan on writing some entries on creating and parsing XML with DFSORT (using the UK90006 / UK90007 functional enhancments that DFSORT Development recently announced). But here’s a limbering up example – creating a CSV file from regular sequential file input.

CSV files (Comma-Separated Value (or Variable if you prefer)) are of the form

"JDLFJDJ DF",4146,"FKJFK"
"JDJDJ JKJJ",12352,"EE FF"
"AAFIELD3FI",4,"94949"
"ACFIELD",35,"34443"

where the commas separate fields, and where the quotes denote their contents are character strings. Each line is a separate row of fields. So it’s really a grid. This is an early form at structuring data as text, and it’s used by many programs such as spreadsheets. It’s not a terribly robust format and probably isn’t a standard. Further, there is no real attempt to define the meaning of the fields.

But it does illustrate a use for the new JFY and SQZ capabilities of DFSORT…

The source data for this example is

JDLFJDJ DF    FKJFK
JDJDJ JKJJ    EE FF
AAFIELD3FI    94949
ACFIELD       34443

where the blanks in the middle are actually a 4-byte binary number.

The DFSORT control statements are…

OPTION COPY
INREC BUILD=(STR1,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=12),X,
             NUM1,EDIT=(IIIIIIIT),X,
             STR2,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=7))
OUTREC BUILD=(PRINTED,SQZ=(SHIFT=LEFT,PAIR=QUOTE,MID=C','))

You’ll notice the widespread use of Symbols, which isn’t a new thing. So here’s the Symbols deck:

//SYMNAMES DD *
POSITION,1           
STR1,*,10,CH         
NUM1,*,4,BI          
STR2,*,8,CH          
*
PRINTED,1,29,CH

The first four symbols map the input record. The fifth one (PRINTED) maps the intermediate record that results from the INREC.

The INREC statement produces (with the sample data) the following intermediate records:

"JDLFJDJ DF"     4146 "FKJFK"
"JDJDJ JKJJ"    12352 "EE FF"
"AAFIELD3FI"        4 "94949"
"ACFIELD"          35 "34443"

So the strings are wrapped in quotes but there are no commas and there has been no squeezing together.

To take the first field

STR1,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=12)

shifts the data to the left, puts quotes around the string (removing trailing blanks) and makes the resulting field 12 bytes wide. (The second field involves number formatting and the third is similar to the first but with a length of 7 bytes, including the quotes.)

The OUTREC statement squeezes out all the spaces outside of the quotes (PAIR=QUOTE telling DFSORT to preserve what’s in the pair of quotes.) MID=C’,’ specifies that any run of spaces (outside of pairs of quotes) are to be replaced by a single comma.

This is, admittedly, a fairly complex example. But I hope it shows some of the capabilities of SQZ and JFY. And maybe this is a sample you can swipe and modify for your applications.

One thing that isn’t clear to me is whether trailing blanks are in fact significant in the CSV file format. Because it’s scarcely a standard it’s probably implementation-dependent. But, personally, I’d assume that blanks were significant.

Removing variable numbers of blanks could be done prior to these new functions being available but it was much more fiddly. I wouldn’t want to even attempt explaining that one. 🙂

And shortly I’ll write some tips about XML and DFSORT.