(Originally posted 2006-04-21.)
I plan on writing some entries on creating and parsing XML with DFSORT (using the UK90006 / UK90007 functional enhancments that DFSORT Development recently announced). But here’s a limbering up
example – creating a CSV file from regular sequential file input.
CSV files (Comma-Separated Value (or Variable if you prefer)) are of the form
"JDLFJDJ DF",4146,"FKJFK" "JDJDJ JKJJ",12352,"EE FF" "AAFIELD3FI",4,"94949" "ACFIELD",35,"34443"
where the commas separate fields, and where the quotes denote their contents are character strings. Each line is a separate row of fields. So it’s really a grid. This is an early form at structuring data as text, and it’s used by many programs such as spreadsheets. It’s not a terribly robust format and probably isn’t a standard. Further, there is no real attempt to define the meaning of the fields.
But it does illustrate a use for the new JFY and SQZ capabilities of DFSORT…
The source data for this example is
JDLFJDJ DF FKJFK JDJDJ JKJJ EE FF AAFIELD3FI 94949 ACFIELD 34443
where the blanks in the middle are actually a 4-byte binary number.
The DFSORT control statements are…
OPTION COPY INREC BUILD=(STR1,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=12),X, NUM1,EDIT=(IIIIIIIT),X, STR2,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=7)) OUTREC BUILD=(PRINTED,SQZ=(SHIFT=LEFT,PAIR=QUOTE,MID=C','))
You’ll notice the widespread use of Symbols, which isn’t a new thing. So here’s the Symbols deck:
//SYMNAMES DD * POSITION,1 STR1,*,10,CH NUM1,*,4,BI STR2,*,8,CH * PRINTED,1,29,CH
The first four symbols map the input record. The fifth one (PRINTED) maps the intermediate record that results from the INREC.
The INREC statement produces (with the sample data) the following intermediate records:
"JDLFJDJ DF" 4146 "FKJFK" "JDJDJ JKJJ" 12352 "EE FF" "AAFIELD3FI" 4 "94949" "ACFIELD" 35 "34443"
So the strings are wrapped in quotes but there are no commas and there has been no squeezing together.
To take the first field
STR1,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=12)
shifts the data to the left, puts quotes around the string (removing trailing blanks) and makes the resulting field 12 bytes wide. (The second field involves number formatting and the third is similar to the first but with a length of 7 bytes, including the quotes.)
The OUTREC statement squeezes out all the spaces outside of the quotes (PAIR=QUOTE telling DFSORT to preserve what’s in the pair of quotes.) MID=C’,’ specifies that any run of spaces (outside of pairs of quotes) are to be replaced by a single comma.
This is, admittedly, a fairly complex example. But I hope it shows some of the capabilities of SQZ and JFY. And maybe this is a sample you can swipe and modify for your applications.
One thing that isn’t clear to me is whether trailing blanks are in fact significant in the CSV file format. Because it’s scarcely a standard it’s probably implementation-dependent. But, personally, I’d assume that blanks were significant.
Removing variable numbers of blanks could be done prior to these new functions being available but it was much more fiddly. I wouldn’t want to even attempt explaining that one. đŸ™‚
And shortly I’ll write some tips about XML and DFSORT.