Parsing XML with DFSORT

(Originally posted 2006-04-25.)

Following on from Generating XML Using DFSORT – Part II here are some thoughts on how to parse XML with DFSORT.

NOTE: For more complex XML than this entry describes you probably want to use the XML Toolkit for z/OS. This provides C++ and Java parsers for XML and a stand-alone XSLT processor.

In this example I'll show you how to take XML that looks like this and create a flat file from it:

<?xml version="1.0" encoding="UTF-8" ?>
<band>
<member surname="Mercury" firstname="Freddie" job="Singer" />
<member surname="May" firstname="Brian" job="Guitarist" />
<member surname="Taylor" firstname="Roger" job="Drummer" />
<member surname="Deacon" firstname="John" job="Bassist" />
</band>

and turn it into our (now familiar)

Mercury         Freddie         Singer   
May             Brian           Guitarist
Taylor          Roger           Drummer  
Deacon          John            Bassist  

which could be mapped with DFSORT Symbols:

Surname,*,16,CH  
Firstname,*,16,CH
Job,*,10,CH

In fact – in this example – we won't use these symbols.

The first thing we need to do is to keep only the data rows – and to do that we code:

  INCLUDE COND=(1,7,CH,EQ,C'<member')

which throws away the first two rows and the last row.

Next we need to parse the data rows using the following INREC statement:

INREC IFTHEN=(WHEN=INIT,
        PARSE=(%1=(STARTAFT=C'<',ENDBEFR=C'/>',FIXLEN=80)),
        BUILD=(%1)),
      IFTHEN=(WHEN=INIT,
        PARSE=(%2=(ABSPOS=1,FIXLEN=18,STARTAFT=C'surname="',ENDBEFR=C'"')),
        BUILD=(%2,1,80)),
      IFTHEN=(WHEN=INIT,
        PARSE=(%3=(ABSPOS=1,FIXLEN=18,STARTAFT=C'firstname="',ENDBEFR=C'"')),
        BUILD=(1,16,%3,1,80)),
      IFTHEN=(WHEN=INIT,
        PARSE=(%4=(ABSPOS=1,FIXLEN=10,STARTAFT=C'job="',ENDBEFR=C'"')),
        BUILD=(1,32,%4))

which looks rather complicated.

This uses IFTHEN (introduced in 2004) and PARSE (introduced in 2006 with UK90006/UK90007).

In fact the IFTHEN clauses are a pipeline of stages. The WHEN=INIT conditions mean that they are performed for all records that pass the INCLUDE statement's condition. Each WHEN=INIT is performed in turn. And all the stages are performed within the INREC statement. That is, before any SORT, SUM, OUTREC or OUTFIL processing.

The first stage in this pipeline is

      IFTHEN=(WHEN=INIT,
        PARSE=(%1=(STARTAFT=C'<',ENDBEFR=C'/>',FIXLEN=80)),
        BUILD=(%1)),

which strips off the surrounding angle brackets from each line, producing an 80-byte record.

The second stage is

   IFTHEN=(WHEN=INIT,
        PARSE=(%2=(ABSPOS=1,FIXLEN=18,STARTAFT=C'surname="',ENDBEFR=C'"')),
        BUILD=(%2,1,80)),

which extracts the surname attribute into a field which is prepended onto the 80-byte record created in the first stage.

The third stage is

   IFTHEN=(WHEN=INIT,
        PARSE=(%3=(ABSPOS=1,FIXLEN=18,STARTAFT=C'firstname="',ENDBEFR=C'"')),
        BUILD=(1,16,%3,1,80)),

which extracts the firstname attribute into a field which is prepended onto the 80-byte record created in the first stage (but after the 16-byte (surname) field extracted in the second stage).

The fourth and final stage is

   IFTHEN=(WHEN=INIT,
        PARSE=(%4=(ABSPOS=1,FIXLEN=10,STARTAFT=C'job="',ENDBEFR=C'"')),
        BUILD=(1,32,%4))

which extracts the job attribute into a field which is appended onto the the 16-byte (surname) field extracted in the second stage and the 16-byte (firstname) field extracted in the third stage.

I admit this looks complicated but it does allow for the attributes to appear in any order in an XML element. What it doesn't do is to allow any old multiple-line format for the input XML. For that you really do need the toolkit. But I'm convinced there are tricks we can teach DFSORT when it comes to parsing XML. It's just that we'd need time to think about them. 🙂

A note on pipelining: When I first saw IFTHEN I thought of it potentially as a pipelining technique. This example is quite a good one for pipelining as everything happens in 4 IFTHEN WHEN=INIT stages. It's actually proved a lot simpler to construct the DFSORT processing this way – and it has isolated all the processing to the INREC statement. So there's lots you can do later on in the DFSORT invocation. (And the ability to allow the attributes (surname, firstname and job) to be in any order was made much easier by this pipelining approach.

But I have to be sanguine about pipelining: At this stage in DFSORT's development we don't have all the capabilities for branching etc that CMS (/TSO) Pipelines has. But I offer you the pipelining model as another way of thinking about what IFTHEN can do, as well as the "treat different records in different ways" original intention. However, we do have nice constructs like WHEN=ANY, WHEN=NONE and HIT=NEXT to construct reasonable pipelines with.

What has been really nice about recent DFSORT innovations is that you find out more things you can do with them every day.

PARSE is worthy of some more discussion: It's brand new (April 2006) and allows DFSORT to parse (duh!) variable-format data. In this case the length of each attribute is variable. %1, %2, %3 and %4 refer to different variable-length fields that we can use in subsequent processing stages. Let's take one usage as an example:

   IFTHEN=(WHEN=INIT,
        PARSE=(%1=(STARTAFT=C'<',ENDBEFR=C'/>',FIXLEN=80)),
        BUILD=(%1)),

In this case we extract into the variable %1 the first string in the input record that starts with < and ends with />, padding to 80 bytes with blanks. The BUILD=(%1) says that the output from this IFTHEN (stage) will be that entire 80-byte %1 variable. That is the input record but with the top and tail removed.

Note: We don't actually need to know how long the string we're extracting is. Prior to PARSE we would've had to know. And that, to me, is one of the very nice features of PARSE.

Published by Martin Packer

.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: