XML, XSLT and DFSORT, Part Three – Multiple XML Input Files

(Originally posted 2011-05-22.)

While I was putting together the original three posts in this series a number of thoughts struck me, amongst which two really cried out for further investigation:

  1. I don't know how your XML data arrives on z/OS but quite a lot of scenarios don't have the data all as one document (file).
  2. XSLT looks complex – particularly if recursion does your head in. 🙂

Thought 2 I'll deal with in a different post. This post relates to thought 1.

 

In the example I've given there are three item elements, representing three transactions. I'm not sure that's entirely realistic:

Certainly there will be many times (for example configuration files) where everything is in the one file. But consider the following scenario:

XML documents arrive in a directory, each representing a single transaction, or maybe a batch of them. In the rest of this post it doesn't matter whether there's more than one transaction in a file, only that there will be multiple files overall. In the previous three posts I've talked about processing a single file with XSLT and passing the results to DFSORT (in a manner the latter can work with). The technique outlined won't (unaltered) process more than one file at a time.

I'd like one DFSORT processing run to handle multiple input XML files. Perhaps you run a batch job every hour to process all the transactions that arrived as XML files, or perhaps daily. The rest of this post shows you one way of doing this. It's a relatively small change to the XSLT stylesheet.

Here are the three transactions, as if they arrived in separate files:

Transaction File 1
<?xml version="1.0"?>
<mydoc>
  <greeting level="h1">
    Hello World!
  </greeting>
  <stuff>
    <item a="1">
      <row>One</row>
    </item>
  </stuff>
</mydoc>

Transaction File 2
<?xml version="1.0"?>
<mydoc>
  <greeting level="h1">
    Hello World!
  </greeting>
  <stuff>
    <item
      a="12">
      <row>Two</row>
    </item>
  </stuff>
</mydoc>


Transaction File 3
<?xml version="1.0"?>
<mydoc>
  <greeting level="h1">                                                        
    Hello World!
  </greeting>
  <stuff>
    <item a="903">
      <row>
      Three
      </row>
    </item>
  </stuff>
</mydoc>

XSLT can't directly process a list of files concatenated together. But you can do it if you can create another file. Here's an example:

Transaction Reference File
<?xml version="1.0"?>
<transactions>
  <transaction filename="txn0001.xml"/>                                        
  <transaction filename="txn0002.xml"/>
  <transaction filename="txn0003.xml"/>
</transactions>

If you can create such a file – perhaps by scanning an "incoming transaction file" directory – you can easily coax XSLT into processing the set of files. Here's a stylesheet that can do it:

XSLT Template Using The document() Function
<?xml version="1.0"?>
<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="xml" encoding="IBM-1047" indent="no"
    omit-xml-declaration="yes"/>

  <xsl:template match="/">
    <xsl:for-each select="transactions/transaction">                    1  
      <xsl:apply-templates select="document(@filename)/mydoc/stuff"/>   2      
    </xsl:for-each>
  </xsl:template>

  <xsl:template match="item">
    <xsl:value-of select="normalize-space(row)"/>
    <xsl:text>,</xsl:text>
    <xsl:value-of select="format-number(@a,'0000')"/>
  </xsl:template>
</xsl:stylesheet>

There are two things to note in this stylesheet:

  1. When you run the transaction reference file through XSLT with this stylesheet this line causes each transaction element to be visited.
  2. The filename attribute of each transaction element is used to pick up a transaction file (which might have one or might have more item elements in).

This "indirection through a transaction file" technique is very powerful.

In practice you might have an "inbound transaction XML" directory that you scan with a program that creates the transaction reference file, invokes (eg) Saxon and then invokes DFSORT, finally deleting all the successfully-processed transaction files. I say "(eg)" because nothing in this revised stylesheet requires XSLT 2.0 and so Saxon isn't the only choice.

I think the challenge in this is knowing when the transactions have been successfully processed and so the inbound files can be deleted. You'd have the same problem if – instead of creating a transaction reference file – you created one large XML file from inbound files. (In fact this is easier.)

Anyone feel like – in any z/OS-supported language – writing something to scan a directory for XML files and create a transaction reference file like the one above from their names?

Published by Martin Packer

I'm a mainframe performance guy and have been for the past 35 years. But I play with lots of other technologies as well.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: