(Originally posted 2012-12-04.)
I’ve talked about BPXWUNIX before but here’s a nice use case: Filtering REXX query results.
When I get your performance data I have code that stores it in a database which I query with (essentially) REXX. The predicate syntax is very simplistic so I’d like to do better. I can’t replace the syntax (not entirely true but close enough) but I can filter the results better.
Consider the following single step:
Reading it as a "U" shape, I pass the query results to a Unix Pipeline consisting of two stages:
- grep – which filters the query results, prepending the all-important line numbers
- cut – which removes the line contents, leaving just the line numbers
These line numbers (stdout) are passed back to the REXX driving code, along with any error information (stderr).
Any use case would be expected to check stderr before processing stdout.
But what is the point of jumping through these hoops?
As I mentioned most recently in Towards A Pattern Explorer – Jobname Analysis , regular expressions (regexps) are very flexible. So I can very easily code a filtering regexp that could be used to reduce the results of my original database query. The diagram above shows just such a workflow. But now for some actual REXX code…
/* REXX */ s.1.1=1234 s.1.2=7543 s.1.3=8911 s.2.0=3 s.2.1='XYZZY' s.2.2='Proxy' s.2.3='Xylophone' atstart='¬' grepstring=atstart'XY' cmd='grep -i -n "'grepstring'" | cut -f1 -d:' call bpxwunix cmd,s.2.,filter.,stderr. do f=1 to filter.0 item=filter.f say item s.1.item end do e=1 to stderr.0 say stderr.e end
Let’s examine the code:
- The first few lines emulate the query – filling a grid of stem variables with data. The code filters on the s.2. variables but eventually it’ll be the surviving s.1. variables that will be printed.
- The line where atstart is assigned a value is interesting: With my emulator (x3270) I can’t actually type a circumflex (^) but it turns out the tilde (¬) works fine for me instead. (In regexps "^" means "the match starts at the beginning of the line".) So I set this variable so I never have to worry about it again – using it as I construct the regexp in the next line.
- In this example the regular expression merely says "match anything starting with ‘XY’". Big deal, I could’ve done that easily in REXX. 🙂
- The "-i" switch on the grep command says "match without regard to case". Again easy to do in REXX. 🙂
- Specifying "-n" says "add line numbers on the front of the matching rows.
- Cut throws away the matching rows, just returning the line numbers for them. "-f1" says "return the first field" and "-d:" says "the first field ends at a colon". In fact the line number ends with a colon so this is a good point to cut the record".
- Note that s.2.0 has to be set to the number of variables to be passed but s.1.0 doesn’t. I stress this as it may catch you out.
- Results are returned from BPXWUNIX in filter. variables (and filter.0 is the count of them) and stderr. contains any error messages.
- The first loop iterates over the returned results (two records in stdout, one with the number "1" and the other with the number "3"). These are used as indexes into the s.1. variables. So s.1.1 (1234) and s.1.3 (8911) are printed.
- Finally any error messages are printed. In Production you might actually test for the presence of nastygrams 🙂 before deciding to use the results of the grep / cut pipeline. In my testing I want both.
There are probably other ways of achieving the same thing – using regexps – using Unix programs. If you prefer them use them. This one seems quite simple to me, Unixwise. And it certainly complies with my performance objective of only transiting once to Unix programs and back. Of course, if you have a lot of filtering instances within a program you’ll transit more often – but then the effect on performance is probably still unnoticeable.
I think the first place I might use this is to better refine what I mean by a “Batch Suite”. I’ve talked about that one before.