Using DFSORT to create CSV files

(Originally posted 2006-04-21.)

I plan on writing some entries on creating and parsing XML with DFSORT (using the UK90006 / UK90007 functional enhancments that DFSORT Development recently announced). But here’s a limbering up example – creating a CSV file from regular sequential file input.

CSV files (Comma-Separated Value (or Variable if you prefer)) are of the form

"JDLFJDJ DF",4146,"FKJFK"
"JDJDJ JKJJ",12352,"EE FF"
"AAFIELD3FI",4,"94949"
"ACFIELD",35,"34443"

where the commas separate fields, and where the quotes denote their contents are character strings. Each line is a separate row of fields. So it’s really a grid. This is an early form at structuring data as text, and it’s used by many programs such as spreadsheets. It’s not a terribly robust format and probably isn’t a standard. Further, there is no real attempt to define the meaning of the fields.

But it does illustrate a use for the new JFY and SQZ capabilities of DFSORT…

The source data for this example is

JDLFJDJ DF    FKJFK
JDJDJ JKJJ    EE FF
AAFIELD3FI    94949
ACFIELD       34443

where the blanks in the middle are actually a 4-byte binary number.

The DFSORT control statements are…

OPTION COPY
INREC BUILD=(STR1,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=12),X,
             NUM1,EDIT=(IIIIIIIT),X,
             STR2,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=7))
OUTREC BUILD=(PRINTED,SQZ=(SHIFT=LEFT,PAIR=QUOTE,MID=C','))

You’ll notice the widespread use of Symbols, which isn’t a new thing. So here’s the Symbols deck:

//SYMNAMES DD *
POSITION,1           
STR1,*,10,CH         
NUM1,*,4,BI          
STR2,*,8,CH          
*
PRINTED,1,29,CH      

The first four symbols map the input record. The fifth one (PRINTED) maps the intermediate record that results from the INREC.

The INREC statement produces (with the sample data) the following intermediate records:

"JDLFJDJ DF"     4146 "FKJFK"
"JDJDJ JKJJ"    12352 "EE FF"
"AAFIELD3FI"        4 "94949"
"ACFIELD"          35 "34443"

So the strings are wrapped in quotes but there are no commas and there has been no squeezing together.

To take the first field

STR1,JFY=(SHIFT=LEFT,LEAD=C'"',TRAIL=C'"',LENGTH=12)

shifts the data to the left, puts quotes around the string (removing trailing blanks) and makes the resulting field 12 bytes wide. (The second field involves number formatting and the third is similar to the first but with a length of 7 bytes, including the quotes.)

The OUTREC statement squeezes out all the spaces outside of the quotes (PAIR=QUOTE telling DFSORT to preserve what’s in the pair of quotes.) MID=C’,’ specifies that any run of spaces (outside of pairs of quotes) are to be replaced by a single comma.

This is, admittedly, a fairly complex example. But I hope it shows some of the capabilities of SQZ and JFY. And maybe this is a sample you can swipe and modify for your applications.

One thing that isn’t clear to me is whether trailing blanks are in fact significant in the CSV file format. Because it’s scarcely a standard it’s probably implementation-dependent. But, personally, I’d assume that blanks were significant.

Removing variable numbers of blanks could be done prior to these new functions being available but it was much more fiddly. I wouldn’t want to even attempt explaining that one. đŸ™‚

And shortly I’ll write some tips about XML and DFSORT.

Published by Martin Packer

.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: