A Comment on “Now Your Embarrassing/Job-Threatening Facebook Photos Could Haunt You For Seven Years”

(Originally posted 2011-06-21.)

Reading Now Your Embarrassing/Job-Threatening Facebook Photos Could Haunt You For Seven Years I think they’ve missed a point or two. Points which may be incidental to the thrust of the article but nonetheless are important ones:

Who’s to say what’s embarrassing or job-threatening?

Some things are obvious: In the post they talk about weapons ownership and racism as two things you wouldn’t want on your "record". Actually, the former example illustrates my point:

While racism is an obvious "no no" weapons ownership is in a grey area. Though personally I abjure weapons ownership there are many among you who feel differently. (And by espousing my position I’m exposing it – in the very previous sentence.) 🙂 But it shouldn’t be job threatening.

To repeat the first point: "Who’s to say what’s acceptable or not, embarrassing or not, job threatening or not?

The second point is: Suppose something is job threatening. Is that all that matters? How about our ability to live whatever’s left of our lives?

The third point is: Don’t we, without even getting close to threatening our jobs and careers, leak position and emotion?

As a, perhaps, aside I’ve just spent three days on a course with friends (both old and new) on "Personal Eminence and Gravitas". (Technically it’s called "Technical Leadership Masterclass 4", for the IBMers amongst you.) One of the "take homes" from that intensive 3 days is how others perceive you. I won’t discuss what my "take home" was but going through the process has caused me to think on reputation a little more. I was pleased the word "leak" was uttered by one of the course facilitators: I’ve used it before and picked up on it in class. It encapsulates the idea that you can’t really know what you’re giving away – so you can’t totally control it. Thank goodness!

Another data point is watching youngsters squabbling on Facebook. You’re supposed to think that’ll come back and haunt you – which would be grossly unfair. It’s actually rather funny. 🙂

I like to joke that we’re all unemployable – based on our Social Networking footprint. The serious point is that there is a definite need for forbearance on Social Networking. Otherwise the genie has to be put back in the bottle and the world has to return to being a really dull place. I for one don’t want to return to that. So much so that I’m prepared to take reputational risks: And we’re back to "authentic voice" and "authentic living".

So, in short, I hope we can all be authentic in what we say and do. That’s more a hope than an expectation – as I know there are parts of the world where it’s impossible to be authentic. 🙂

And, to recap on something I alluded to above: Who’s to say what emotion and position I’m leaking to you by writing this post?

I Know What You Did Last Summer – Abstract

(Originally posted 2011-06-12.)

Here’s the first-pass abstract slide for "I Know What You Did Last Summer". I’d be interested in your thoughts on it…


What’s The Point Of This Presentation?

So, it’s got a "tongue-in-cheek" title but what’s it all about?

I think one of the least appreciated aspects of z/OS and its middleware is the richness of instrumentation it gives you. Here I describe it and just some of the ways you can get value from SMF.


Capacity, Performance and Systems Investigation

(Originally posted 2011-05-29.)

At one level Performance and Capacity Management and Systems Investigation are clearly linked: They share the same data. Or much of it at least.

But I think they’re linked in another way, too.

Over the past few years I’ve gradually shifted emphasis towards Systems Investigation. But this has only been a slight shift, a “non modo sed etiam” and still only really mainframe. So I’m still me but I’ve slowly realised I look at things a little differently – in addition to the old foci. And this thinking will feed into the “I Know What You Did Last Summer” presentation I’ve got started on.

But this post isn’t about that. It’s making a different point. And maybe a fairly obvious one:

When looking at Systems (and Application) performance it really pays to know what the landscape looks like. The same is true of Capacity, of course.

The received wisdom (and I certainly buy into it) is that Performance is about top down decomposition. For example, find the big CPU use categories and break them down: LPAR -> WLM Workload -> WLM Service Class -> Address Space -> Transaction -> Plan -> Package -> Statement.

We’re doing fine on this – particularly if we’re desirous of a technologically-neutral technique – until we hit the address space level. A CICS workload looks the same as a SAP one and as a Batch one: They’re just gobs of CPU. Even at this point I’m a little nervous as I like to be able to pronounce the names in the frames. (That’s why as a customer you might hear me ask “how do you pronounce this?”)

But when we get to the address space level that approach begins to unravel: For a start there might not be an address space. For instance, a WLM Service Class that manages DDF work won’t have address spaces in it: It’ll have enclaves. So how on earth can we decompose those two engines of CPU into actors – below the Service Class level? We certainly can’t do it with SMF Type 30 records.

Similarly, when I sketched out “Transaction -> Plan -> Package -> Statement” that only really works for CICS or IMS transactions accessing DB2.

To be fair you can do decomposition of things like DDF and Batch jobs with the right instrumentation and techniques. With that comment the point is hoving into view:

Sooner or later you have to know what this stuff is. The technologically neutral approach gives out after a certain point – and it’s different for different environments.

Another example is memory: With caveats on data clarity you can decompose memory usage in a similar way. But it behaves differently to CPU, and different from one user to another. While you might expect CPU to grow linearly – more or less – with workload, you wouldn’t really expect that of memory. Or at least you shouldn’t:

  • Some workloads do use memory in a linear way: Double the workload (by whatever metric you choose) and the memory usage doubles. The classic example is TSO users: Go from 200 of them to 400 and the memory usage goes from 1GB to 2GB (at least in 1990 terms).
  • Many workloads are sub-linear: Double the number of CICS users and they memory usage may go up by only 50%.

Indeed the latter case is an example of where it’s not clear at all: When you say “double the number of CICS users” are you expecting to double the number of regions? Or do you mean add the users into existing regions?

So the conclusion is you need to know about the applications to get very far. And you probably need to know a lot about things like LPAR setup. Indeed, as I’ve often said, just keeping track of all those LPARs is a major headache for many customers these days.

So, I’d encourage you to get curious about your systems. Take a Systems Investigative perspective when you can. It’s also a great way to build common understanding with those that actually run the systems.

But this is not the same as the school of tuning which says “find sins of omission or commission and comment on them”. These kinds of sins are important – but only in the context of a top-down approach. Who cares if a parameter is not set correctly – in a classical sense – if it affects nothing you care about?

So, as I say the linkage between Capacity / Performance and Understanding Systems is twofold: The raw data and the need to know what’s really happening on systems and with applications.

Finding The DB2 Accounting Trace Records For an IMS Batch Job Step

(Originally posted 2011-05-24.)

When tuning DB2 batch it’s important to know which SMF 101 Accounting Trace record corresponds to which job step.

A few years ago I wrote code to do this. It works fine for all z/OS DB2 Batch except that originated by IMS. Here’s how it works:

  1. Find all the Type 30 Step-End records for a given job name.
  2. Find the Type 101 Accounting Trace records for which the Correlation ID in the 101 record matches the job name. (First 8 bytes of Correlation ID is the job name for most DB2 Batch.)
  3. You match up the records from steps 1 and 2: The SMF ID must match and the start timestamp (QWACBSC) and stop timestamp (QWACESC) in the 101 slot into the time range the Type 30 is for. This gives you the running of the job and the step. The latter is nice because SMF 101 doesn’t know anything about job steps.

But there’s a small problem with this: QWACBSC and QWACESC use GMT and the Type 30 timings use local time. (In the experiment below the local time is 2 hours ahead of GMT: Mainland Europe on Summer Time.) So you have to adjust QWACBSC using the timezone offset. But SMF 101 doesn’t have the timezone offset in it. Except it does. :-) If you compare the SMF record timestamp to QWACESC you get the timezone offset (as QWACESC is GMT).

All the above has worked fine. But now I have a customer with IMS Batch. And for IMS Batch the Correlation ID isn’t the job name. So what can you do?

It turns out that for long-running IMS/DB2 Batch you can still do the timestamp comparison trick – so long as you don’t attempt to match on Correlation ID. For short-running jobs it’s not likely to work as well – because there will be too many potential matches. Fortunately I care rather less about these cases.

Here’s an example match:

  • SMF 30: The job name is AFRF032B and it runs on SYSA for 4904 seconds, consuming 289 seconds of CPU.
  • SMF 101: There was an IMS batch job (PSB name "AFRFD2D") which ran on SYSA for 4899 seconds, consuming 288 seconds of CPU.

So far so good but they might be different jobs. Let’s examine their timestamps (adjusted for timezones). First start times:

  • SMF 30: 11 APR 06 00:31:12.39
  • SMF 101: 11 APR 06 00:31:15.93

So we took 3 seconds to get started. Sounds reasonable.

Now end times:

  • SMF 30: 11 APR 06 01:52:56.63
  • SMF 101: 11 APR 06 01:52:55.40

So we cut the 101 record about a second before we ended the step. Again reasonable.

The bracketing is nice here and I think we have a match.

So now we can proceed to figure out why the step took as long as it did by using the other numbers in SMF 101 to explain the time:

  • The Class 1 time not in Class 2 is 4899 – 4608 seconds = 5 minutes. So not much leverage in working on non-DB2 stuff. (For CPU it’s 288 – 273 = 15 seconds.)
  • Of the 4608 seconds 2699 are accounted for – 59%.
  • The majority of that is in Write To Log time (1450 seconds) but also Lock/Latch Wait time (651) seconds. So clearly some issues here to sort out.
  • There’s very little database I/O time, surprisingly (about 250 seconds).

The unaccounted for time (the other 41%) is actually not surprising to me: It turns out this job runs in a low-velocity WLM service class on a busy system so the usual theory (that unaccounted for time is CPU Queuing) seems reasonable. It’s not definitive as there are lots of other components of unaccounted for time, but it’s a reasonable theory.

The last piece -where the time is going – is meant to give you some idea of why you’d want to find the 101 Accounting Trace records for an IMS/DB2 job step you’re interested in tuning. Of course I haven’t delved down into which DB2 package is the one where the Lock/Latch Wait time is most prevalent – but that can be done from the Class 7 and 8 Accounting traces. Nor have I gone anywhere near the SQL.

But this shows you how you can make a reasonable start – even for IMS/DB2. A nice result – having thought for a number of years it couldn’t be done. :-)

XML, XSLT and DFSORT, Part Three – Multiple XML Input Files

(Originally posted 2011-05-22.)

While I was putting together the original three posts in this series a number of thoughts struck me, amongst which two really cried out for further investigation:

  1. I don't know how your XML data arrives on z/OS but quite a lot of scenarios don't have the data all as one document (file).
  2. XSLT looks complex – particularly if recursion does your head in. 🙂

Thought 2 I'll deal with in a different post. This post relates to thought 1.

 

In the example I've given there are three item elements, representing three transactions. I'm not sure that's entirely realistic:

Certainly there will be many times (for example configuration files) where everything is in the one file. But consider the following scenario:

XML documents arrive in a directory, each representing a single transaction, or maybe a batch of them. In the rest of this post it doesn't matter whether there's more than one transaction in a file, only that there will be multiple files overall. In the previous three posts I've talked about processing a single file with XSLT and passing the results to DFSORT (in a manner the latter can work with). The technique outlined won't (unaltered) process more than one file at a time.

I'd like one DFSORT processing run to handle multiple input XML files. Perhaps you run a batch job every hour to process all the transactions that arrived as XML files, or perhaps daily. The rest of this post shows you one way of doing this. It's a relatively small change to the XSLT stylesheet.

Here are the three transactions, as if they arrived in separate files:

Transaction File 1
<?xml version="1.0"?>
<mydoc>
  <greeting level="h1">
    Hello World!
  </greeting>
  <stuff>
    <item a="1">
      <row>One</row>
    </item>
  </stuff>
</mydoc>

Transaction File 2
<?xml version="1.0"?>
<mydoc>
  <greeting level="h1">
    Hello World!
  </greeting>
  <stuff>
    <item
      a="12">
      <row>Two</row>
    </item>
  </stuff>
</mydoc>


Transaction File 3
<?xml version="1.0"?>
<mydoc>
  <greeting level="h1">                                                        
    Hello World!
  </greeting>
  <stuff>
    <item a="903">
      <row>
      Three
      </row>
    </item>
  </stuff>
</mydoc>

XSLT can't directly process a list of files concatenated together. But you can do it if you can create another file. Here's an example:

Transaction Reference File
<?xml version="1.0"?>
<transactions>
  <transaction filename="txn0001.xml"/>                                        
  <transaction filename="txn0002.xml"/>
  <transaction filename="txn0003.xml"/>
</transactions>

If you can create such a file – perhaps by scanning an "incoming transaction file" directory – you can easily coax XSLT into processing the set of files. Here's a stylesheet that can do it:

XSLT Template Using The document() Function
<?xml version="1.0"?>
<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="xml" encoding="IBM-1047" indent="no"
    omit-xml-declaration="yes"/>

  <xsl:template match="/">
    <xsl:for-each select="transactions/transaction">                    1  
      <xsl:apply-templates select="document(@filename)/mydoc/stuff"/>   2      
    </xsl:for-each>
  </xsl:template>

  <xsl:template match="item">
    <xsl:value-of select="normalize-space(row)"/>
    <xsl:text>,</xsl:text>
    <xsl:value-of select="format-number(@a,'0000')"/>
  </xsl:template>
</xsl:stylesheet>

There are two things to note in this stylesheet:

  1. When you run the transaction reference file through XSLT with this stylesheet this line causes each transaction element to be visited.
  2. The filename attribute of each transaction element is used to pick up a transaction file (which might have one or might have more item elements in).

This "indirection through a transaction file" technique is very powerful.

In practice you might have an "inbound transaction XML" directory that you scan with a program that creates the transaction reference file, invokes (eg) Saxon and then invokes DFSORT, finally deleting all the successfully-processed transaction files. I say "(eg)" because nothing in this revised stylesheet requires XSLT 2.0 and so Saxon isn't the only choice.

I think the challenge in this is knowing when the transactions have been successfully processed and so the inbound files can be deleted. You'd have the same problem if – instead of creating a transaction reference file – you created one large XML file from inbound files. (In fact this is easier.)

Anyone feel like – in any z/OS-supported language – writing something to scan a directory for XML files and create a transaction reference file like the one above from their names?

Who Are You And What Have You Done With My Readers? :-)

(Originally posted 2011-05-20.)

I’ve done a little analysis of hits on recent blog posts. I wonder what you make of it:

Looking at this pie chart slices start at the top and go anticlockwise. Reading the legend is from latest to oldest, left to right, wrapping appropriately.

While the blog is called "Mainframe Performance Topics" I have lots of other interests. It’s interesting to see which posts have gained the most hits. It’s also fun to speculate on how these hits came about. I recognise hits don’t translate into reads and certainly don’t reflect what the readers thought of the post. Nonetheless:

  • The most popular topics are about Android and HTML5. I don’t think this is my usual readership. 🙂 I think they flew in via web search. 🙂
  • The multi-part posts (Batch Architecture, Vienna Conference, and XSLT / DFSORT) seem to do reasonably well. I’m guessing that people who read one part read the others.

I’m not particularly worried about numbers of hits, actually. I write because I think I have something to say. Also because it encourages me to learn from doing the research.

Although I can readily recreate this chart with additional data I’m not planning to do so.

But still, I think this chart is interesting. I hope you do, too.

XML, XSLT and DFSORT, Part Two – DFSORT

(Originally posted 2011-05-20.)

Following on from this post and this one, this post discusses the DFSORT piece.

The DFSORT code in this post parses the Comma-Separated Variable (CSV) file produced by XSLT processing. In this simple example it merely produces a flat file report, but the post has a few additional details you might find valuable.

First, here’s the SORTIN DD JCL statement. It’s not like a regular sequential file statement as it has to access the zFS file system we wrote the data to with Saxon:

JCL SORTIN DD Statement
//SORTIN    DD  PATHOPTS=(ORDONLY),RECFM=VB,LRECL=255,BLKSIZE=32760,  
//          PATH='/u/userzfs/myuserid/testXSL.txt',FILEDATA=TEXT 

Of particular note in this DD statement is the record format (VB), the logical record length (255) and the block size (32760). This is definitely VB data. I’ve found a LRECL greater than the maximum size Saxon has produced is fine. Similarly a sensible block size works. FILEDATA=TEXT is also needed.

Here’s the SYMNAMES file:

Contents of the SYMNAMES File
RDW,1,4,BI                                                            
Row,%01 
a,%02 

You’ll need an accompanying SYMNOUT DD – for the messages DFSORT (or ICETOOL) produce when the SYMNAMES file is processed.

I’m showing you this first so you can understand the main DFSORT control statements file: Everywhere you see the symbol "Row" in these statements you can interpret it as "%01", whatever that is. Similarly for "a" and "%02". The "RDW" symbol maps the Record Descriptor Word that we need for variable-length record processing. (DFSORT can convert from variable- to fixed-record format but we won’t do that here.)

Now for the control statements:

Contents of the SYSIN File
  OPTION COPY,VLSHRT                                                1
  INCLUDE COND=(1,2,BI,GE,+12)                                      1 
  INREC IFOUTLEN=70,                                                2
    IFTHEN=(WHEN=INIT,                                              3
      PARSE=(%01=(STARTAFT=C'"',ENDBEFR=C'",',FIXLEN=10),           4
             %02=(FIXLEN=8)),                                       5
      BUILD=(1,4,%01,%02)),                                         6
    IFTHEN=(WHEN=INIT,BUILD=(RDW,Row,X,a,SFF,EDIT=(I,IIT)))         7

This is a very simple case of using DFSORT. So, for example, there’s no SORT, no OUTFIL, nor any ICETOOL sophistication. It’s meant to show how you can get the data into a format DFSORT can use. Let me explain how it works:

  1. VLSHRT and the INCLUDE statement will, between them, remove the blank lines Saxon created.
  2. IFOUTLEN sets the output record length (from INREC) to 70 bytes.
  3. This WHEN=INIT parses the input (CSV) data.
  4. The %01 field is filled from after the first " and before the second (with comma) ". It becomes a fixed character field of length 10 bytes.
  5. The %02 field is filled with the remainder of the data in the record – for a length of 8 bytes.
  6. We write out the RDW and both parsed fields.
  7. This WHEN=INIT is used to produce the report lines. We print the %01 field ("Row"), a space, and the %02 numeric field ("a"). For the numeric field ("a") we parse the characters to extract the numeric value (with SFF) and then immediately reformat it (with EDIT=(I,IIT) ) to insert commas.

And here’s the output:

The Resultant Output
One            1                                                      
Two           12
Three        903

Of course we needn’t have just printed the data, as I’ve indicated. With a more interesting data set you could do a lot more.

The use of symbols ("Row" and "a") was largely gratuitous here. It just shows you can use them. If you’re a regular DFSORT or ICETOOL user you’ll know their value.

If you were to strip this down to the bare essentials the first WHEN=INIT does most of the work – parsing the data into fixed positions. (The one really useful thing the second WHEN=INIT does is to convert the numeric field into a packed decimal number.)

So, over these three posts I’ve shown how you can use XSLT to half tame XML data and DFSORT to complete the taming. I have a couple of other things I want to talk about in relation to this. But those belong in a separate post.

XML, XSLT and DFSORT, Part One – Creating A Flat File With XSLT

(Originally posted 2011-05-14.)

This is the second part of a (currently) three-part series on processing XML data with DFSORT, given a little help from standard XML processing tools. The first part – which you should read before reading on – is here.

To recap, getting XML data into DFSORT is a two stage process:

  1. Flatten the XML data so that it consists of records with fields in sensible places.
  2. Process this flattened data with DFSORT / ICETOOL or something else, like REXX.

This post covers the first part of this. You’ll see how you can transform the XML file below into a Comma-Separated Variable (CSV) file.

Here’s the source XML, complete with a few quirks:

XML File To Be Processed
<?xml version="1.0"?>                                                                           
<mydoc>
  <greeting level="h1">
    Hello World!
  </greeting>
  <stuff>
    <item a="1">        1
      <row>One</row>
    </item>
    <item               2
      a="12">
      <row>Two</row>
    </item>
    <item a="903">      3
      <row>
      Three
      </row>
    </item>
  </stuff>
</mydoc>

Here’s the resulting flat file:

Resulting Flat File For Processing With DFSORT / ICETOOL
               
    "One",1                                                                  
    "Two",12                                                                                    
    "Three",903                                                                           
               

I’m assuming you can read XML reasonably well. In this example we have three "item" elements as children of a "stuff" element. The "stuff" element is a child of the "mydoc" element. The "mydoc" element also contains a "greeting" element. Each "item" element has a single "row" child element and an "a" attribute.

To produce the output we need to find the "item" elements and pick up the "row" child element and the "a" attribute value. We write one record for each "item" element. (We ignore the "greeting" element entirely.)

You may notice some white space around the output: A leading blank line and a trailing one, as well as four spaces at the beginning of each output record. I’ve not found a way for getting rid of those and the DFSORT program (described in the next part of this series) will have to strip them off.

I’ve deliberately formatted each "item" element slightly differently:

  1. The "a" attribute is on the same line as the "item" tag, and the "row" element fits entirely on one line.
  2. The "a" attribute is on the next line, and the "row" element is on one line.
  3. The "a" attribute is as in 1 but the "row" element text is split across three lines.

The point is that XML is so flexible in its layout you’re better off relying on a supplied parser than writing your own. It’s true that there are good parsers that don’t do XSLT transformations. And obviously the z/OS System XML one is very nice, particularly with its ability to use specialty engines. As I said in my previous post, XML parsing is computationally expensive.

Why not write your own code that calls the z/OS System XML parser? That’s certainly an option – and indeed you might find the transformations you want to do can’t (or shouldn’t) be done with XSLT. Here the similarity to DFSORT is quite strong: Both provide ways to use built-in functions to transform data – neither of which require a formal programming language (in XML’s case perhaps PHP, java or C++ and DFSORT’s case perhaps Assembler, COBOL or PL/I).

In this example you scarcely need to write your own program. (Handling item 3, as I’ll describe later, is the one case where a program might be better.).

Here’s the XSLT stylesheet that produces the required output:

XSLT Stylesheet
<?xml version="1.0"?>
<xsl:stylesheet version="2.0"                               1                                    
  xmlns&colon;xsl="http://www.w3.org/1999/XSL/Transform">
 
  <xsl:output method="text" encoding="IBM-1047"/>           2
 
  <xsl:template match="/">
    <xsl:apply-templates select="mydoc/stuff"/>             3
  </xsl:template>
 
  <xsl:template match="item">                               4
    <xsl:text>"</xsl:text>                                  5
    <xsl:value-of select="normalize-space(row)"/>           6
    <xsl:text>",</xsl:text>                                 7
    <xsl:value-of select="@a"/>                             8
  </xsl:template>
 
</xsl:stylesheet>

This is a fairly simple stylesheet. Here’s how it works (and the numbered lines above correspond to the numbering below:

  1. Here we declare the level of the XSLT language to be 2.0. In fact there’s nothing about this stylesheet that requires that language level.
  2. Here we say we’re creating a text file as output and that it will be EBCDIC (IBM-1047).
  3. Here we search for the "stuff" element within the "mydoc" element – using the XPath language. In fact the only "stuff" element we’ll match with is the one at the top of the XML node tree – because it’s preceded by a "/". For each matched "stuff" element we apply the template below.
  4. This template matches all "item" elements within the "stuff" element.
  5. Here text starts to be written out for the record. In this case the leading quote around the first piece of data.
  6. Here the first of piece of data is written out – the text value of the "row" element. We’ll come back to the normalize-space() function in a minute.
  7. Here a trailing quote and a comma are written out.
  8. Here the value of the "a" attribute is written out. It needs no adjustment (in this example).

Because item 3’s "row" value was split across several lines the normalize-space() function is used to take out leading white space. It has the unfortunate side-effect of replacing multiple white space characters in the text with a single space so it’s not brilliant. You could write a fairly simple but recursive piece of XSLT to do the job properly – but it’s beyond the scope of this post. In fact this might be the thing that makes you abandon XSLT and call the XML parser from a program.

If you want to get into XSLT I can recommend Doug Tidwell’s XSLT, Second Edition Mastering XML Transformations book. It’s what I’ve used – with some additional research on the web (which didn’t yield much additional insight).

I used the Saxon B (free) parser as it’s the only one I can get my hands on that does XSLT 2.0. It’s a java jar. You could use others, of course.

Invoking from the OMVS I found a 64MB heap specification was enough (running in a 128MB region). For more complex transformations I can see a larger heap might be needed. (In fact I didn’t check how much garbage collection, if any, the JVM did. It just ran.) :-)

(If you specify version="1.0" for the stylesheet Saxon will issue a message informing you you’re running a 1.0 stylesheet through a 2.0 processor. This has caused no problems whatsoever for me.)

Originally I downloaded Saxon to my Linux laptop and used it with an ASCII stylesheet and XML data. Transferring to z/OS was straightforward. This approach may work for you, if you’re setting out to learn XSLT.

Learning and working with XSLT continues to be a journey of discovery. If I’m missing some tricks that you spot feel free to let me know. The next post in this series will be about the DFSORT counterpart.

XML, XSLT and DFSORT, Part Zero – Overview

(Originally posted 2011-05-11.)

In the distant past I’ve written about using DFSORT to parse XML. This post (and two follow-on posts) will describe an experiment to make such processing much more robust.

In this post I’ll talk about what the problem I’m trying to solve is. And why. And a brief outline of my solution.

About XML

This isn’t meant to be the most detailed description of XML, nor a complete list of where it’s used. I just want you to know (if you didn’t already) why I think XML processing is something to pay attention to.

Increasingly applications are producing and consuming XML. (They’re also producing and consuming other new data styles, such as JSON.) I divide this usage into two categories:

  • Configuration data (generally small files).
  • Business data (often very large files).

XML has many advantages as a data format, including robustness, standardisation and an increasing degree of inter-enterprise adoption. It also has useful attributes like the ability to validate a file against a strict grammar and also transformability.

XML is, however, expensive to parse. And when I talk of transformability the tools to transform XML are still quite rudimentary – you often have to write your own program to do it.

(This being an IBM-hosted blog you might expect me to talk about Websphere Transformation Extender (WTX). I shan’t, except to say it has very nice tooling. Similarly, you might expect me to talk about the Extensible Stylesheet Language for Transformations (XSLT) – as a standard for transformations. You’re in luck with XSLT – but that will have to wait. I’d like to talk about IBM’s z/OS XML Toolkit (which includes an XSLT processor) but that will have to wait. And as for DataPower, it’ll be a while before I talk about it, also.)

Those of you familiar with IBM mainframe technology will be aware of z/OS System XML and perhaps the z/OS XML Toolkit. You’re probably aware of the ability to offload XML parsing to a zAAP (or zIIP if zAAP-on-zIIP is in play). I think our story’s pretty good with these.

So IBM thinks XML’s important, and so do lots of installations. It’s important that mainframe people know what they can do, too.

The Problem I’m Trying To Solve

I don’t feel it necessary to describe what DFSORT can do in this post. Suffice it to say it can do lots of what I call "slice and dice" with data. So long as that data is record-oriented. (And it’s even better if you include ICETOOL.)

So why don’t we just process XML with DFSORT?

(Let’s disregard publishing XML with DFSORT as that’s very easy to do.)

Traditionally DFSORT has done really well when records are neatly divided into fixed-position (and length) fields. Over recent years it’s got better and better at handling cases where the layout of each record is variable. For example, it can parse Comma Separated Value (CSV) files just fine – with PARSE.

But XML is so much more variable. For example, two partners could each send you a file, created by their own programming or tools. They’d be semantically equivalent but the data would be differently formatted (and still be valid according to the same XML Schema). And the differences wouldn’t just be the fields being at different offsets, or in a different order in the same record: One format might have an element all on one line whereas the other might spread it across three lines.

So any DFSORT application attempting to process XML would be vulnerable to this variability. In the past, when I’ve written of DFSORT processing XML I think I’ve said that you need stable XML to work with. I think that’s still right.

So is that it? Well, no it isn’t: I still think it’s possible to take advantage of DFSORT’s power, even with XML data to process. Read on…

XSLT

XSLT (standing for Extensible Stylesheet Language for Transformations) is a standards-based way of transforming XML – to (different) XML, HTML or even plain text. And by "(different) XML" I also mean things like SVG vector graphics.

With XSLT you define a transformation using another piece of XML – a stylesheet (or XSL file). Whether you author this by hand (my current state) or use tooling to generate one is up to you. Using a program you use the XSL file to transform your XML to whatever you want.

There are lots of XSLT programs. I’ve used Apache Xalan (which is tightly-coupled to the IBM ones on z/OS), Saxon, the capabilities built in to Firefox (and other browsers), PHP’s one – to name just a few. Of these only Saxon can do XSLT 2.0 at present. (The others all do XSLT 1.0, often with extension capabilities.)

For my work, written up in these posts, I used the free variant of Saxon – because it does 2.0. Nothing in these posts, however, requires 2.0. I want 2.0 just so I can learn 2.0. One day maybe it’ll catch on and then I’ll be in good shape. Learning 2.0 isn’t incompatible with learning 1.0 but it might leave you frustrated. 🙂

The important piece in all this is that XSLT can be used to take arbitrary XML and flatten it – into records with fields in vaguely sensible places. In EBCDIC.

Putting It Together

So far I’ve talked about two distinct components: DFSORT / ICETOOL and XSLT. I’ve said it’d be nice to be able to process XML-originated data using DFSORT, robustly. So here’s how it can be done:

  1. Use XSLT to create a flat file (in HFS or zFS) with the data flattened into sensible records with well-delimited fields. (In the example, in the next post in this series, I’ll use CSV as the intermediate file layout.)
  2. Use DFSORT’s parsing capabilities to read the intermediate file and then do DFSORT’s normal things with it. (This will be the third post in the series.)

Conceptually simple but a little fiddly in the details. In the next two posts I’ll clothe the idea with some of those details.

Over the past few days, while preparing to write this post, I’ve done some experimenting – including creating a full working example. There are lots of "wrinkles" on this idea, including other ways of doing pieces of it. Perhaps you’ve thought of a few. If so do let us know.

Vienna Conference – A Trip Report

(Originally posted 2011–05–09.)

I think people know better than to ask me for a trip report to a conference I’ve attended. They’ll get what I think is important – and their priorities are probably different. So here is that trip report anyway… 🙂

You’ll probably have gathered by now I’m for a “for the journey” person than a “for the destination” one. But I won’t bore you with the minor inconveniences on both ends of the trip – because I personally try to forget the (often long and tedious) journey when I get to the destination (or home again). I’d rather focus on where my travels took me.

I will admit to sampling hostelries – with good friends. I also was very pleased to be in the company of friends – both old and new. Personally, I think the social aspect of a conference is almost as important as the sessions. And, of course, some really useful conversations were had – with IBMers, business partners, vendors and customers. I can’t really summarise these – for the usual obvious reasons.

My four sessions mainly went well. The topics are summarised here. But here are my perceptions:

  • “Parallel Sysplex Performance Topics” went well, I think. Mainly because I talked about the subset of items I really wanted to talk about. Most notably “Structure Execution Time” and “Structure Duplexing Performance”. (And I had a very good question on how the non-CPU element of request time relates to distance and technology.)

  • I think “Much Ado About CPU” has become disorganised. It needs refocusing. Particularly as I expect the CPU picture to continue to evolve over time. And so this one has to survive in some form.

  • “Memory Matters” was done while too tired. I also think it contains too much baggage from DB2 Version 8 (even though many customers are still on 8). I also think the “Coupling Facility Memory” section doesn’t really add much.

  • “DB2 Data Sharing Performance For Beginners” turns out not to be a “for beginners” presentation, really. If I’m introspective about it I thought when I wrote it it would help explain the major themes but I couldn’t pretend be as knowledgeable as the true greats of Data Sharing. For example, those that write “DB2 Performance Topics” Redbooks. So I should skip the “for beginners” part of the title and rework it to make it as good a presentation as I can for those who already have some knowledge. The stuff needs saying but I need to say it better.

But, I think in the above I’m being harsh on myself. I got good evaluations on all four. Maybe the audience is very kind. 🙂

I took notes using the Writepad handwriting application on iPad (into Evernote so I can read them and edit them everywhere). Writepad does a very good job but I still wish I’d brought the keyboard along: I found the mechanics of taking notes diminished my ability to listen. I’d pull out the following presentations as ones I got a lot out of. (Others will have their own favourites.)

  • Susann Thomas (a team-mate from the 2009 Batch Modernisation residency) did a very nice job on introducing XML for System z. So much so I’m convinced I need to understand the XML story better. (You may have seen on Twitter my attempts to do stuff.)
  • Harald Bender’s XML and RMF presentation makes me think a practical example of XML to play with is that produced by RMF.
  • Marna Walle did a nice job of her z/OS R.13 Preview presentation. (Which reminds me I must write on in-stream SYSIN in a PROC soon.)
  • George Ng (who apparently reads this blog ! 🙂 ) presented on Infiniband Coupling Facility links. I note all RMF knows about Infiniband links is the channel path acronym “CIB”. It can’t distinguish between e.g 1x SDR and 12x DDR, for example. You can imagine I’d “have views” on that sort of thing. :-)”)
  • Christian Daser explained rather well, I thought, the tricky DB2 V10 Bitemporal support, as well as a few other pieces of DB2 Application componentry in 10.
  • Peter Enrico will certainly have opened some eyes to the value of SMF 113 CPU Measurement Facility instrumentation. I’ve been familiar with this for a long time – certainly from before we announced it. I would write about it if I didn’t feel Peter (and John Burg) hadn’t already done so as well as I could have – if not better.
  • Mike Buzzetti gave a very good introduction to Cloud on System z, particularly about TSAM for provisioning.
  • I’ve tried to run with Jeff Berger’s foils before now. I’m so glad I don’t have to anymore: He does so much better a job of it than I do. 🙂 The topic I saw him present this time was on DB2 V10 Performance. I’m eagerly awaiting the Redbook, of course.
  • And last but not least Bob Rogers’ “What You Do When You’re a z196 CPU”. I’m very glad he keeps updating it for each generation of processors. It’s one where you really do need to know what happened before so I’m pleased he’s kept in z9 and z10 stuff.

Of course I don’t know whether you have access to the proceedings. If you do I recommend you pull down some of the above sets of slides. If not maybe you’ll see them at some other conference or user group.

After a week of this I’ll admit to coming home very tired. (In fact I think everyone felt that way by Thursday morning.) But it was a great week for me. And thanks to everyone who made it so good for me.

And if you didn’t get to Vienna I hope you do get to some System z conferences: They’ve a very good use of money and time.