(Originally posted 2006-04-24.)
Following on from This entry on creating CSV files here’s the first of several entries on manipulating XML with DFSORT…
This is not intended to be a tutorial on XML but rather an exploration of how DFSORT can read (shred) and write (compose) XML. (I also use the terms ingest
and emit
synonymously.) Hopefully some of the basic concepts of XML will come across during the course of these blog entries.
XML is – to my mind – much better than CSV…
- It IS a standard – and lots of things are built on this standard.
- There is semantic and structuring information built into XML.
- Support for reading and writing XML are built into many programming languages and other programs, such as DB2.
So XML is a great way of structuring information for exchange between programs and systems.
So we’ll start with a simple set of data we’d like to convert to XML…
Consider the small data set
Mercury Freddie Singer May Brian Guitarist Taylor Roger Drummer Deacon John Bassist
This data is mapped with the following DFSORT Symbols deck:
Surname,*,16,CH Firstname,*,16,CH Job,*,10,CH
We’d like to create an XML file:
<?xml version="1.0" encoding="UTF-8" ?> <band> <member surname="Mercury" firstname="Freddie" job="Singer" /> <member surname="May" firstname="Brian" job="Guitarist" /> <member surname="Taylor" firstname="Roger" job="Drummer" /> <member surname="Deacon" firstname="John" job="Bassist" /> </band>>
The first thing we do is to take the input records and wrap them with quotes, squeezing out the trailing spaces:
<member surname="Mercury" firstname="Freddie" job="Singer" /> <member surname="May" firstname="Brian" job="Guitarist" /> <member surname="Taylor" firstname="Roger" job="Drummer" /> <member surname="Deacon" firstname="John" job="Bassist" />
To do this we can use the following INREC statement:
INREC BUILD=(C'<member', C'surname=',Surname,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=18),X, C'firstname=',Firstname,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=18),X, C'job=',Job,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=12), C'/>')
The above statement reformats the three fields – removing the trailing spaces – in the manner described in my CSV creation blog entry.
The next (OUTREC) statement squeezes out the spaces, preserving those within the quotes:
OUTREC BUILD=(_Unsqueezed,SQZ=(SHIFT=LEFT,PAIR=QUOTE,MID=C' '))
To make the above code work I defined an additional symbol
_Unsqueezed,1,82,CH
This symbol maps the whole of the output of the INREC statement – that is the records wrapped in quotes.
The final step is an OUTFIL statement that tops and tails the output data with some more tags:
OUTFIL FNAMES=OUT1,REMOVECC, HEADER1=('<?xml version="1.0" encoding="UTF-8" ?>',/,'<band>'), TRAILER1=('</band>')
NOTE: REMOVECC removes the ANSI carriage control bytes.
In this example I’ve spread the transformations across INREC, OUTREC and OUTFIL statements. You might want to place the function differently: You might, for instance, want to do a SORT or include only certain records upstream of the XML creation. Perhaps in a later blog entry I’ll talk about that some more. But there are other things I want to talk about regarding DFSORT and XML. Stay tuned for a different style of XML.