XML, XSLT and DFSORT, Part Zero – Overview

(Originally posted 2011-05-11.)

In the distant past I’ve written about using DFSORT to parse XML. This post (and two follow-on posts) will describe an experiment to make such processing much more robust.

In this post I’ll talk about what the problem I’m trying to solve is. And why. And a brief outline of my solution.

About XML

This isn’t meant to be the most detailed description of XML, nor a complete list of where it’s used. I just want you to know (if you didn’t already) why I think XML processing is something to pay attention to.

Increasingly applications are producing and consuming XML. (They’re also producing and consuming other new data styles, such as JSON.) I divide this usage into two categories:

  • Configuration data (generally small files).
  • Business data (often very large files).

XML has many advantages as a data format, including robustness, standardisation and an increasing degree of inter-enterprise adoption. It also has useful attributes like the ability to validate a file against a strict grammar and also transformability.

XML is, however, expensive to parse. And when I talk of transformability the tools to transform XML are still quite rudimentary – you often have to write your own program to do it.

(This being an IBM-hosted blog you might expect me to talk about Websphere Transformation Extender (WTX). I shan’t, except to say it has very nice tooling. Similarly, you might expect me to talk about the Extensible Stylesheet Language for Transformations (XSLT) – as a standard for transformations. You’re in luck with XSLT – but that will have to wait. I’d like to talk about IBM’s z/OS XML Toolkit (which includes an XSLT processor) but that will have to wait. And as for DataPower, it’ll be a while before I talk about it, also.)

Those of you familiar with IBM mainframe technology will be aware of z/OS System XML and perhaps the z/OS XML Toolkit. You’re probably aware of the ability to offload XML parsing to a zAAP (or zIIP if zAAP-on-zIIP is in play). I think our story’s pretty good with these.

So IBM thinks XML’s important, and so do lots of installations. It’s important that mainframe people know what they can do, too.

The Problem I’m Trying To Solve

I don’t feel it necessary to describe what DFSORT can do in this post. Suffice it to say it can do lots of what I call "slice and dice" with data. So long as that data is record-oriented. (And it’s even better if you include ICETOOL.)

So why don’t we just process XML with DFSORT?

(Let’s disregard publishing XML with DFSORT as that’s very easy to do.)

Traditionally DFSORT has done really well when records are neatly divided into fixed-position (and length) fields. Over recent years it’s got better and better at handling cases where the layout of each record is variable. For example, it can parse Comma Separated Value (CSV) files just fine – with PARSE.

But XML is so much more variable. For example, two partners could each send you a file, created by their own programming or tools. They’d be semantically equivalent but the data would be differently formatted (and still be valid according to the same XML Schema). And the differences wouldn’t just be the fields being at different offsets, or in a different order in the same record: One format might have an element all on one line whereas the other might spread it across three lines.

So any DFSORT application attempting to process XML would be vulnerable to this variability. In the past, when I’ve written of DFSORT processing XML I think I’ve said that you need stable XML to work with. I think that’s still right.

So is that it? Well, no it isn’t: I still think it’s possible to take advantage of DFSORT’s power, even with XML data to process. Read on…

XSLT

XSLT (standing for Extensible Stylesheet Language for Transformations) is a standards-based way of transforming XML – to (different) XML, HTML or even plain text. And by "(different) XML" I also mean things like SVG vector graphics.

With XSLT you define a transformation using another piece of XML – a stylesheet (or XSL file). Whether you author this by hand (my current state) or use tooling to generate one is up to you. Using a program you use the XSL file to transform your XML to whatever you want.

There are lots of XSLT programs. I’ve used Apache Xalan (which is tightly-coupled to the IBM ones on z/OS), Saxon, the capabilities built in to Firefox (and other browsers), PHP’s one – to name just a few. Of these only Saxon can do XSLT 2.0 at present. (The others all do XSLT 1.0, often with extension capabilities.)

For my work, written up in these posts, I used the free variant of Saxon – because it does 2.0. Nothing in these posts, however, requires 2.0. I want 2.0 just so I can learn 2.0. One day maybe it’ll catch on and then I’ll be in good shape. Learning 2.0 isn’t incompatible with learning 1.0 but it might leave you frustrated. πŸ™‚

The important piece in all this is that XSLT can be used to take arbitrary XML and flatten it – into records with fields in vaguely sensible places. In EBCDIC.

Putting It Together

So far I’ve talked about two distinct components: DFSORT / ICETOOL and XSLT. I’ve said it’d be nice to be able to process XML-originated data using DFSORT, robustly. So here’s how it can be done:

  1. Use XSLT to create a flat file (in HFS or zFS) with the data flattened into sensible records with well-delimited fields. (In the example, in the next post in this series, I’ll use CSV as the intermediate file layout.)
  2. Use DFSORT’s parsing capabilities to read the intermediate file and then do DFSORT’s normal things with it. (This will be the third post in the series.)

Conceptually simple but a little fiddly in the details. In the next two posts I’ll clothe the idea with some of those details.

Over the past few days, while preparing to write this post, I’ve done some experimenting – including creating a full working example. There are lots of "wrinkles" on this idea, including other ways of doing pieces of it. Perhaps you’ve thought of a few. If so do let us know.

Vienna Conference – A Trip Report

(Originally posted 2011–05–09.)

I think people know better than to ask me for a trip report to a conference I’ve attended. They’ll get what I think is important – and their priorities are probably different. So here is that trip report anyway… πŸ™‚

You’ll probably have gathered by now I’m for a “for the journey” person than a “for the destination” one. But I won’t bore you with the minor inconveniences on both ends of the trip – because I personally try to forget the (often long and tedious) journey when I get to the destination (or home again). I’d rather focus on where my travels took me.

I will admit to sampling hostelries – with good friends. I also was very pleased to be in the company of friends – both old and new. Personally, I think the social aspect of a conference is almost as important as the sessions. And, of course, some really useful conversations were had – with IBMers, business partners, vendors and customers. I can’t really summarise these – for the usual obvious reasons.

My four sessions mainly went well. The topics are summarised here. But here are my perceptions:

  • “Parallel Sysplex Performance Topics” went well, I think. Mainly because I talked about the subset of items I really wanted to talk about. Most notably “Structure Execution Time” and “Structure Duplexing Performance”. (And I had a very good question on how the non-CPU element of request time relates to distance and technology.)

  • I think “Much Ado About CPU” has become disorganised. It needs refocusing. Particularly as I expect the CPU picture to continue to evolve over time. And so this one has to survive in some form.

  • “Memory Matters” was done while too tired. I also think it contains too much baggage from DB2 Version 8 (even though many customers are still on 8). I also think the “Coupling Facility Memory” section doesn’t really add much.

  • “DB2 Data Sharing Performance For Beginners” turns out not to be a “for beginners” presentation, really. If I’m introspective about it I thought when I wrote it it would help explain the major themes but I couldn’t pretend be as knowledgeable as the true greats of Data Sharing. For example, those that write “DB2 Performance Topics” Redbooks. So I should skip the “for beginners” part of the title and rework it to make it as good a presentation as I can for those who already have some knowledge. The stuff needs saying but I need to say it better.

But, I think in the above I’m being harsh on myself. I got good evaluations on all four. Maybe the audience is very kind. πŸ™‚

I took notes using the Writepad handwriting application on iPad (into Evernote so I can read them and edit them everywhere). Writepad does a very good job but I still wish I’d brought the keyboard along: I found the mechanics of taking notes diminished my ability to listen. I’d pull out the following presentations as ones I got a lot out of. (Others will have their own favourites.)

  • Susann Thomas (a team-mate from the 2009 Batch Modernisation residency) did a very nice job on introducing XML for System z. So much so I’m convinced I need to understand the XML story better. (You may have seen on Twitter my attempts to do stuff.)
  • Harald Bender’s XML and RMF presentation makes me think a practical example of XML to play with is that produced by RMF.
  • Marna Walle did a nice job of her z/OS R.13 Preview presentation. (Which reminds me I must write on in-stream SYSIN in a PROC soon.)
  • George Ng (who apparently reads this blog ! πŸ™‚ ) presented on Infiniband Coupling Facility links. I note all RMF knows about Infiniband links is the channel path acronym “CIB”. It can’t distinguish between e.g 1x SDR and 12x DDR, for example. You can imagine I’d “have views” on that sort of thing. :-)”)
  • Christian Daser explained rather well, I thought, the tricky DB2 V10 Bitemporal support, as well as a few other pieces of DB2 Application componentry in 10.
  • Peter Enrico will certainly have opened some eyes to the value of SMF 113 CPU Measurement Facility instrumentation. I’ve been familiar with this for a long time – certainly from before we announced it. I would write about it if I didn’t feel Peter (and John Burg) hadn’t already done so as well as I could have – if not better.
  • Mike Buzzetti gave a very good introduction to Cloud on System z, particularly about TSAM for provisioning.
  • I’ve tried to run with Jeff Berger’s foils before now. I’m so glad I don’t have to anymore: He does so much better a job of it than I do. πŸ™‚ The topic I saw him present this time was on DB2 V10 Performance. I’m eagerly awaiting the Redbook, of course.
  • And last but not least Bob Rogers’ “What You Do When You’re a z196 CPU”. I’m very glad he keeps updating it for each generation of processors. It’s one where you really do need to know what happened before so I’m pleased he’s kept in z9 and z10 stuff.

Of course I don’t know whether you have access to the proceedings. If you do I recommend you pull down some of the above sets of slides. If not maybe you’ll see them at some other conference or user group.

After a week of this I’ll admit to coming home very tired. (In fact I think everyone felt that way by Thursday morning.) But it was a great week for me. And thanks to everyone who made it so good for me.

And if you didn’t get to Vienna I hope you do get to some System z conferences: They’ve a very good use of money and time.

Batch Architecture, Part Three

(Originally posted 2011-05-04.)

Up until now I haven’t talked much about DB2, except perhaps to note it’s a little different. But what is a DB2 Batch job anyway? It’s important to note a DB2 job ISN’T necessarily exclusively DB2 – although some are. It’s just a job that has some DB2 in it.

The reason for writing a separate post, apart from breaking things up a little, is because batch jobs with DB2 in them present particular challenges. But also additional opportunities. In general these jobs can be treated like others but with extra considerations.

The main challenge is determining which data the job accesses – and how it accesses it. Let’s break this up into two stages:

  1. Identifying which DB2 plans and packages are accessed by which job / step.
  2. Identifying which DB2 tables and other objects are used by these plans and packages. And perhaps how.

Identifying DB2 Plans and Packages

This piece is relatively straightforward: DB2 Accounting Trace -with trace classes 7 and 8 enabled – will give you the packages used. You need to associate the Accounting Trace (SMF 101) record with its job / step.

For most DB2 attachment types the Correlation ID is the same as the job name. (Identifying the step name and number is a matter of timestamp comparison with the SMF30 records – which my code learned to do long ago.)

For IMS it’s more complicated, with the Correlation ID being the PSB name.

(A byproduct of this step might be discovering which jobs use a particular DB2 Collection or Plan name. Sometimes these are closely related to the application itself.)

Identifying Used Objects

This piece is much harder, particularly for Dynamic SQL. Fortunately most DB2 batch uses Static SQL. Even so it’s still tough: If you have the package names you can use the DB2 Package Dependency table in the Catalog to figure out which tables and views the package uses. At least in principle: There’s no guarantee these dependencies will get exercised – as there’s no guarantee the statements using them will ever get executed.

Another problem with this is figuring out whether the access is read-only or for-update.

To totally figure out which statements are executed (and which objects they update and read) would require much deeper analysis – probably involving Performance Trace and extracting SQL statement text from the Catalog.

Conclusion

So this is very different from the non-DB2 case. But at least we can glean what data a DB2 batch job OUGHT to be interested in. And, by aggregation, it’s not hard to work out what data an entire batch application uses.

In this post I wanted to show how DB2 complicates things but that it’s not hopeless. In fact there’s a substantial silver lining to the cloud: Without examining the (possibly missing) source code you can look inside the job at the embedded SQL, if you’re prepared to extract them from the DB2 Catalog.

You’ll notice I’ve said very little in this set of posts about Performance. This is deliberate: Although much of the instrumentation I’ve described is primarily used for Performance these posts have been about Architecture. Which is, I think, a different perspective.

I expect I’ll return to this theme at some point. For now I’ll just note it’s been fun thinking about familiar stuff in a slightly different way.

By the way this post was written using the remarkably accurate WritePad app on the iPad. It’s grown better at recognising my scrawl in the few hours I’ve used it – or perhaps it’s me that’s getting trained. πŸ™‚

I Know What You Did Last Summer

(Originally posted 2011-04-26.)

This is literally a sketchy outline for a new presentation I want to build. The working title is indeed "I Know What You Did Last Summer".
There’s clearly not much structure to this. But the basic outline idea is there: What can an installation glean without too much effort?

Let the graphology begin. πŸ™‚

Batch Architecture, Part Two

(Originally posted 2011–04–25.)

I concluded Batch Architecture – Part One with a brief mention of inter-relationships and data. I’d like to expand on that on this part.

Often the inter-relationships between applications are data driven – which is why I’m linking the two in this post (and in my thinking). But let’s think about the inter-relationships that matter. There are four levels:

  1. Between applications.
  2. Between jobs.
  3. Between steps in a job.
  4. Between phases in a step.

The first three are well understood, I think. The fourth is something I explored last year. Before I talk about it let me talk about “LOADS” – which I mentioned in Memories of Hiperbatch.

(And a minor note on terminology: Yes I KNOW that OPEN and CLOSE are macros. I don’t intend to use the capitalisation here – because the act of opening and closing a data set is meaningful, too (and less grating to read). Forgive me if this “sloppiness” offends.) :-)")


Life Of A Data Set (LOADS)

I won’t claim to have invented this technique. (As I said in “Memories of Hiperbatch” I declined an offer to write up a patent application because I knew I hadn’t originated it.) But I do advocate its use quite a bit. Here’s an (oft-used) example:

If you have a single-step job that writes a sequential data set and another that reads it (both from start to finish) there’s a characteristic data set “signature”: Two opens, one after the other, one for update, one for read. If you discern this pattern you might think “BatchPipes/MVS”. (Depending on other factors you might think other things – such as VIO.)

So this is a powerful technique.


LOADS Of Dependencies :-)")

In 1993 we wrote code to list the life of each data set a job opened and closed. Not long after that we got tired of figuring out dependencies by hand from LOADS. :-)") So we fixed it:

At its simplest a writer followed by a writer indicates a (“WW”) dependency. A writer followed by a reader indicates a (“WR”) dependency, also. And so on.

Pragmatically some of these dependencies aren’t real, or at least it isn’t as simple as this sounds. For example:

  • This says nothing about PDS members.

  • GDGs are a little different.

  • A writer one morning and a reader the same evening might not be marked as a dependency in the batch scheduler (though it probably ought to be). To at least alert the analyst (mainly me these days) to this sort of thing the code pumps out the time lag between the upstream close and downstream opens. (This is an enhancement I made, together with some more “eyecatcher” things with timestamps last year.)

  • What’s the key here? Do we include volser?

But you can see there’s lots of merit in the technique, even with these wrinkles.


Step Phases

As I said before application-level, job-level (in some ways the same thing) and step-level dependencies are things we’ve known about for a long time. Also we’ve know about DFSORT (and other sort) phases for a long time, too: Input, Intermediate-Merge and Output phases. These should be familiar, although people tend to forget about the possibility of an intermediate merge phase – because it should only apply to large sorts.

So, if sorts have phases, what about other steps? Last year I enhanced the code to create Gantt charts for data set opens and closes within a step. In many cases jobs became no more interesting because of it. But in a number of cases fine structure appeared: Non-sort steps demonstrably had phases. In one example a step that read a pair of data sets in parallel wrote to a succession of output data sets. I could see this from the open and close timestamps of the output data sets. (Without looking at the source code I couldn’t be sure but maybe there’s some mileage in dissecting this step.)

It’s in my code: If it applies to your jobs I’ll be sure to tell you about it.


An Application And Its Data

Apart from the small matter of scale figuring out which data an application uses is the same problem as figuring out which data a job uses.

I think I’ll talk about DB2 in a later post, as this one has already become lengthy.

As you probably know there is lots of instrumentation on data sets in SMF. Without going into a lot of repetitive description:

  • You can get information about disk data sets from SMF 42 Subtype 6.
  • You can get information about VSAM data sets from SMF 62 (open) and SMF 64 (close).
  • For non-VSAM it’s SMF 14 (for read) and 15 (for update).

There are a number of lines of enquiry you might like to pursue, including:

  • Working out which data sets contribute most to the application’s processing time.

    Here you’d use SMF 42 and something like I/O number or (more usefully) I/O number times Response Time.

  • Figuring out which data sets are strongly related to this application and no other.

    In this case SMF 14, 15, 62 and 64 are needed. (You don’t need both 62 and 64 for the same data set.)

None of the above applies to DB2: You don’t get 14, 15, 62 or 64 for DB2 data (despite DB2 using Linear Data Sets, a form of VSAM). But there is useful work you can do on DB2 data classification. And that is the subject of the next post in this series.

DFSORT – Now With Extra Magical Ingredients

(Originally posted 2011–04–21.)

Thanks to Scott Drummond for reminding me of last Autumn’s DFSORT Function PTFs – UK90025 and UK90026. They’re mentioned in the preview for z/OS Release 13 so now is not such a bad time to be talking about them. So let me pick out a few highlights:



Translation Between ASCII And EBCDIC, And To And From Hex and Binary

For a long time DFSORT has been able to translate to upper case (TRAN=LTOU), to lower case (TRAN=UTOL) and using a table (TRAN=ALTSEQ).

Now you can convert from ASCII to EBCDIC (TRAN=ATOE) and back (TRAN=ETOA). Translation is performed using TCP/IP’s hardcoded translation table.

Other “utility” translations are added: BIT, UNBIT, HEX and UNHEX. For example TRAN=HEX would translate X’C1F1′ to C’C1F1′ and TRAN=UNBIT would
translate C’1100000111110001′ to X’C1F1′.



Date Field Arithmetic

DFSORT already had some nice functions for handling dates and times. But here are some new things. This isn’t an exhaustive list:

  • You can add years to a date field – with ADDYEARS.
  • You can subtract months from a date field – with SUBMONS.
  • You can calculate the difference between two dates – with DATEDIFF.
  • You can calculate the next Tuesday for a date field – with NEXTDTUE.
  • You can calculate the previous Wednesday for a date field – with PREVDWED.
  • You can calculate the last day of the quarter – with LASTDAYQ.


JCL Symbols In Control Statements

You can now construct Symbols incorporating JCL PROC or SET symbols. These can be used in DFSORT and ICETOOL control statements, just like other symbols. You specify this by coding JPn“&MYSYM” in the PARM parameter of the EXEC statement. (In fact there need be no JCL symbol in this so you could pass in other strings this way and an expected use is for JPn to contain a mixture of JCL Symbols and fixed text.) n can be any one of 0 through to 9.

This support is in addition to the ability to use System Symbols (introduced with UK90013).


Microseconds In Timestamps

You can use the new DATE5 keyword to create a timestamp constant at run-time in the form ‘yyyy-mm-dd-h.mm.ss.nnnnnn’. DB2 folks might recognise this as the timestamp format for DB2 Unload and DSNTIAUL. You can use this for things like comparisons.


Chunking And Stitching Together Records

You can use the new ICETOOL RESIZE operator to:

  • Split records into fixed-sized output records. For example, take a RECFM=FB, LRECL=500 file and create a RECFM=FB, LRECL=100 file – creating 5 new output records from each input record.
  • Join together fixed-sized input records. For example, take a RECFM=FB, LRECL=100 file and create a RECFM=FB, LRECL=500 file – in effect reversing the above by joining 5 input records together to make an output record.

In each case you can see there could be problems with partial output records. DFSORT “does the right thing” using blanks.


Begin Group When Key Changes

I mentioned WHEN=GROUP here, in particular BEGIN=. With BEGIN= you get a new group when the condition you specify is satisfied. Now, with KEYBEGIN= you get a new group when the value in a particular field changes. For example:



SORT FIELDS=(1,12,CH,A,13,8,CH,D)
OUTREC IFTHEN=(WHEN=GROUP,KEYBEGIN=(1,12),
PUSH=(13:13,8,31:ID=3))






sticks a group number (3 characters wide) on the end of each record. The group number is incremented when there’s a new value in the 12-byte field that begins in position 1.


There are lots of other (in my opinion) smaller enhancements in this PTF.

If you want to know whether the appropriate PTF is on look for the following message in a DFSORT run:



ICE201I H RECORD TYPE …



If you see an “H” then you’re all set.

And you can read about these enhancements in more detail in User Guide for DFSORT PTFs UK90025 and UK90026 (SORTUGPH).

HTML5 Up and Running – A Review Of Sorts

(Originally posted 2011–04–20.)

William Gibson’s “The future is already here β€” it’s just not very evenly distributed” applies very well to HTML5. It’s even more true of CSS3. Despite that (or maybe because of it) it’s a good time to dive into HTML5 – before everyone else does. πŸ™‚

So, a few months ago I bought, read and inwardly digested πŸ™‚ Mark Pilgrim’s HTML5 Up and Running, published by O’Reilly in August 2010. I have a rule of thumb: If a topic is covered by an O’Reilly book it’s probably ready for prime time. If it’s in “For Dummies” it’s probably too late. (With apologies to other similarly fine publishers and, of course, to the publishers of “For Dummies”.) It’s a glib rule but it’s mine. πŸ™‚

So how does this rule of thumb work out for HTML5? Well, if you make a “poor choice” of browser then not very well. “Poor choice” is in quotes because:

  • No one browser fully implements HTML5.

  • It’s not just about which browser but about which level.

  • You might be happy with one browser’s implementation but not another’s.

  • There’s a degree of ambiguity, development of the state of the art, etc about HTML5 itself. I’d characterise it as a (slowly) moving target.

And note that this book is from last year. So things will have changed. But:

  • It goes to great pains to describe the support by each browser.
  • It makes the point in doing so that browser support is variable and you soon get the drift as to what each browser’s maker’s attitude is to HTML5.

Having been around a bit I know something about technology adoption: 25 years ago it would take at least 18 months from a product’s announcement to it being implemented in most installations. I really don’t think anything much has changed. So you wouldn’t write code that depended on a feature your customers (or users) don’t have yet. Well actually you would: You’d just expect it to take a while for them to catch up. And you certainly wouldn’t make it immediately mandatory.

And so it is with HTML5. Mark helps out considerably with the “support of unsupportive browsers” issue by recommending the Modernizr HTML and CSS Feature Detection package. (You can follow them on Twitter: here) I’m sure there are other techniques for handling this but Mark’s right in pointing out this one hunts for capability rather than a named browser. Given some of the features of HTML5 are awfully similar to those provided by javascript frameworks such as Dojo, I’d not be surprised if these frameworks could be relied on in the future to do feature detection (simulating it if not present in the browser). Good examples of this are input elements in forms, much enhanced in HTML5.

At this point I’m reminded I haven’t outlined what’s in HTML5. So here’s a high-level list:

  • The Canvas drawing surface (which I have a “Production” application built around).

  • Offline applications (which I’ve experimented with).

  • Video (which looks a mess and set to remain one).

  • Geolocation (which I’ve not used in the HTML5 context but have, of course, in Social Networking and other applications).

  • Local Storage (a better replacement for cookies).

  • Forms enhancements (which look really nice). Rather than my attempting to create graphics of them see here for some input type examples.

  • Microdata (a replacement for the non-standard Microformats and the highly-incomprehensible RDFa – as a way of annotating parts of web pages with structured data).

So lots of really valuable things. But how does the book do?

Given the earliness of its publication I think it does very well. From the above I think you can see it pragmatically handles the issue of support – which is going to be key. It also describes each feature very well, with good clear examples. It also adds a historical backdrop – particularly when talking about how unknown elements are handled – so it gives you a good idea how we got here.

So, I think HTML5 is more than ready to be played with and this book is a very good one to get you started. (I’m assuming you’re not starting from a “zero knowledge of HTML” position.) It doesn’t tackle CSS3 and I’ve yet to find anything that does. When I find such a book I’ll probably buy it and review it here.

My Slides Are Ready For Vienna

(Originally posted 2011–04–19.)

I wouldn’t want you to decide not to come to Vienna, just because I’ve made my slides available on Slideshare:

  • I’d hope you’d come to Vienna anyway. It’s a great place and it’s going to be a great conference.
  • Lots of people can’t make Vienna and I don’t suppose having the slides to hand is going to tilt the playing field significantly away from coming if you weren’t going to anyway.

I don’t think there’s any prohibition in uploading slides: It’s more important the messages get out. So here are the four presentations I’m giving. If you DO spot errors please let me know. Thanks!

And, I know many of you will have seen these before. If you’ve not seen them recently I hope they’re sufficiently evolved since you last saw them. I do have plans to do brand new presentations this year. More on that, in due course, in this blog.

I’m A Sucker For Ingenuity

(Originally posted 2011-04-15.)

Every once in a while I come across a particularly good idea: Where someone has done something particularly clever to solve a problem. Here’s one very recent example:

On the iPhone (and iPod touches) you can only display icons. They can be augmented with a numeric counter, but that’s all. (The idea is the app shows you the number of unread emails, for example.)

But here’s the idea, in a nutshell: If you tried hard enough you could display any number you wanted (actually any integer).

Those clever people at International Travel Weather Calculator have produced a pair of iPhone apps – Celsius and Fahrenheit. (I guess they needed to create two for the hard-of-converting.) πŸ™‚

These two apps display the local (or remote) temperature permanently – without you having to open the app.

So they’ve subverted the counter – and that’s the clever part (weather apps being ten-a-penny).

(And this post is the first one I’ve created using Ecto. I mention it because it creates better HTML than the one built in to developerWorks and I hope it formats better. You can be the judge of that.)

Batch Architecture, Part One

(Originally posted 2011–04–12.)

First a word of thanks to Ferdy for his insightful comment on Batch Architecture, Part Zero. And also to my IBM colleague Torsten Michelmann for his offline note on the subject.

As I indicated in Part Zero I hoped to talk about jobs in a subsequent post. And this is that post. In particular I want to discuss

  1. Viewing jobs as part of distinct applications, and
  2. Generating a high-level understanding of individual jobs

Mostly I’m talking about using SMF 30 job-end records, but consider also:

  • SMF 30 step-end records.
  • SMF 16 DFSORT-invocation records (and, for balance, those for Syncsort).
  • SMF 101 DB2 Accounting Trace.
  • Scheduler Information.
  • Locally-held information about jobs.

(When I talk about jobs I’m aware there are other activities, not running as z/OS-based jobs. These include other actions on z/OS, such as automated operator actions, recovery actions, and jobs running on other platforms. In this post I’m more focusing on z/OS-based batch jobs.)


Grouping Jobs Into Applications
There are lots of ways of grouping jobs into applications…

Most installations claim a job naming convention. For example:

  • First character is “P” for “Production”, “D” for “Development” and “M” for “Miscellaneous”.
  • Second through fourth characters denote the application. (Maybe there’s a business area as the second character and the other two are applications within that.)
  • Last character denotes the frequency, e.g. “D” for “Daily”, “W” for “Weekly”, “M” for “Monthly”.
  • The remaining characters (often numeric) are an identifier within the application.

Sometimes I see naming conventions that are the other way round. I would recommend – if you have the choice – having this way round. So status and application are at the front. The reason I recommend this is it makes it much easier to code queries against any instrumentation – whether you’re using SAS, Tivoli Decision Support or the home-grown code I use. (If you’re merging batch portfolios and have to pick a naming convention this is the one I’d definitely go for.)

Your workload scheduler may well have different names for operations (in Tivoli Workload Scheduler parlance) so some care is required with those.

An interesting question is how well an installation observes their naming convention. As the old joke goes “we love naming conventions: We’ve got lots of them”. πŸ™‚ Analysis of SMF 30 should give you view of whether the naming convention is being observed.

As well as job names it’s sometimes interesting to see which userid(s) jobs are submitted under. Often Production batch is submitted from a single userid, according to Type 30. Similarly you can see which job class, WLM workload, service class and report class a job runs in.

Sometimes the programmer name field in Type 30 reveals application information.

Within a window it is occasionally the case that when a job runs is closely related to which application it’s in, though usually applications are intermingled in time – to some degree.

… And the above are just examples of characterisation information.


Understanding Individual Jobs – At A High Level

Whether you’ve grouped jobs into applications or are just looking at individual jobs it’s useful to characterise them. Typical characterisations include:

  • Whether jobs are long-running or short. Likewise CPU- or I/O-intensive.

  • Whether jobs are in-essence single step. (“In essence” alludes to the fact many jobs have small first and last steps, for management purposes.)

  • Whether jobs have in-line backups (the presence of e.g. IDCAMS steps being a good indicator).

  • How data sets are created and deleted for steps (e.g. IEFBR14 steps between processing steps).

  • Whether jobs use tape or do non-database I/O (visible in tape and disk EXCP counts).

  • Reliability statistics.

  • Use of DB2. (Slightly tricky for IMS DB2 jobs but still can be done.)

  • Clonedness.

  • Sort product usage.

The above are all discernable from job- and step-level information. At a slightly lower level (because it requires the use of data set OPEN information) is characterising the data access as being to VSAM, or BDAM, or QSAM/BSAM (or some combination thereof).

A lot of the characterisation of jobs is centred around standards. For example, how jobs are set up by the installation features heavily in the above list. Other sorts of standards can only be seen in things like JCL.

While the above obviously applies to individual jobs it can equally be applied to applications (as identified above) but it’s obviously a bit more work.


This post has talked about how to use instrumentation to group jobs into applications and the like. It’s also included some thoughts on how to characterise individual jobs and applications.

I hope in the next part to talk about relationships between applications. And to dive deeper into the application’s data.