Batch Architecture, Part Three

(Originally posted 2011-05-04.)

Up until now I haven’t talked much about DB2, except perhaps to note it’s a little different. But what is a DB2 Batch job anyway? It’s important to note a DB2 job ISN’T necessarily exclusively DB2 – although some are. It’s just a job that has some DB2 in it.

The reason for writing a separate post, apart from breaking things up a little, is because batch jobs with DB2 in them present particular challenges. But also additional opportunities. In general these jobs can be treated like others but with extra considerations.

The main challenge is determining which data the job accesses – and how it accesses it. Let’s break this up into two stages:

  1. Identifying which DB2 plans and packages are accessed by which job / step.
  2. Identifying which DB2 tables and other objects are used by these plans and packages. And perhaps how.

Identifying DB2 Plans and Packages

This piece is relatively straightforward: DB2 Accounting Trace -with trace classes 7 and 8 enabled – will give you the packages used. You need to associate the Accounting Trace (SMF 101) record with its job / step.

For most DB2 attachment types the Correlation ID is the same as the job name. (Identifying the step name and number is a matter of timestamp comparison with the SMF30 records – which my code learned to do long ago.)

For IMS it’s more complicated, with the Correlation ID being the PSB name.

(A byproduct of this step might be discovering which jobs use a particular DB2 Collection or Plan name. Sometimes these are closely related to the application itself.)

Identifying Used Objects

This piece is much harder, particularly for Dynamic SQL. Fortunately most DB2 batch uses Static SQL. Even so it’s still tough: If you have the package names you can use the DB2 Package Dependency table in the Catalog to figure out which tables and views the package uses. At least in principle: There’s no guarantee these dependencies will get exercised – as there’s no guarantee the statements using them will ever get executed.

Another problem with this is figuring out whether the access is read-only or for-update.

To totally figure out which statements are executed (and which objects they update and read) would require much deeper analysis – probably involving Performance Trace and extracting SQL statement text from the Catalog.

Conclusion

So this is very different from the non-DB2 case. But at least we can glean what data a DB2 batch job OUGHT to be interested in. And, by aggregation, it’s not hard to work out what data an entire batch application uses.

In this post I wanted to show how DB2 complicates things but that it’s not hopeless. In fact there’s a substantial silver lining to the cloud: Without examining the (possibly missing) source code you can look inside the job at the embedded SQL, if you’re prepared to extract them from the DB2 Catalog.

You’ll notice I’ve said very little in this set of posts about Performance. This is deliberate: Although much of the instrumentation I’ve described is primarily used for Performance these posts have been about Architecture. Which is, I think, a different perspective.

I expect I’ll return to this theme at some point. For now I’ll just note it’s been fun thinking about familiar stuff in a slightly different way.

By the way this post was written using the remarkably accurate WritePad app on the iPad. It’s grown better at recognising my scrawl in the few hours I’ve used it – or perhaps it’s me that’s getting trained. πŸ™‚

I Know What You Did Last Summer

(Originally posted 2011-04-26.)

This is literally a sketchy outline for a new presentation I want to build. The working title is indeed "I Know What You Did Last Summer".
There’s clearly not much structure to this. But the basic outline idea is there: What can an installation glean without too much effort?

Let the graphology begin. πŸ™‚

Batch Architecture, Part Two

(Originally posted 2011–04–25.)

I concluded Batch Architecture – Part One with a brief mention of inter-relationships and data. I’d like to expand on that on this part.

Often the inter-relationships between applications are data driven – which is why I’m linking the two in this post (and in my thinking). But let’s think about the inter-relationships that matter. There are four levels:

  1. Between applications.
  2. Between jobs.
  3. Between steps in a job.
  4. Between phases in a step.

The first three are well understood, I think. The fourth is something I explored last year. Before I talk about it let me talk about “LOADS” – which I mentioned in Memories of Hiperbatch.

(And a minor note on terminology: Yes I KNOW that OPEN and CLOSE are macros. I don’t intend to use the capitalisation here – because the act of opening and closing a data set is meaningful, too (and less grating to read). Forgive me if this “sloppiness” offends.) :-)")


Life Of A Data Set (LOADS)

I won’t claim to have invented this technique. (As I said in “Memories of Hiperbatch” I declined an offer to write up a patent application because I knew I hadn’t originated it.) But I do advocate its use quite a bit. Here’s an (oft-used) example:

If you have a single-step job that writes a sequential data set and another that reads it (both from start to finish) there’s a characteristic data set “signature”: Two opens, one after the other, one for update, one for read. If you discern this pattern you might think “BatchPipes/MVS”. (Depending on other factors you might think other things – such as VIO.)

So this is a powerful technique.


LOADS Of Dependencies :-)")

In 1993 we wrote code to list the life of each data set a job opened and closed. Not long after that we got tired of figuring out dependencies by hand from LOADS. :-)") So we fixed it:

At its simplest a writer followed by a writer indicates a (“WW”) dependency. A writer followed by a reader indicates a (“WR”) dependency, also. And so on.

Pragmatically some of these dependencies aren’t real, or at least it isn’t as simple as this sounds. For example:

  • This says nothing about PDS members.

  • GDGs are a little different.

  • A writer one morning and a reader the same evening might not be marked as a dependency in the batch scheduler (though it probably ought to be). To at least alert the analyst (mainly me these days) to this sort of thing the code pumps out the time lag between the upstream close and downstream opens. (This is an enhancement I made, together with some more “eyecatcher” things with timestamps last year.)

  • What’s the key here? Do we include volser?

But you can see there’s lots of merit in the technique, even with these wrinkles.


Step Phases

As I said before application-level, job-level (in some ways the same thing) and step-level dependencies are things we’ve known about for a long time. Also we’ve know about DFSORT (and other sort) phases for a long time, too: Input, Intermediate-Merge and Output phases. These should be familiar, although people tend to forget about the possibility of an intermediate merge phase – because it should only apply to large sorts.

So, if sorts have phases, what about other steps? Last year I enhanced the code to create Gantt charts for data set opens and closes within a step. In many cases jobs became no more interesting because of it. But in a number of cases fine structure appeared: Non-sort steps demonstrably had phases. In one example a step that read a pair of data sets in parallel wrote to a succession of output data sets. I could see this from the open and close timestamps of the output data sets. (Without looking at the source code I couldn’t be sure but maybe there’s some mileage in dissecting this step.)

It’s in my code: If it applies to your jobs I’ll be sure to tell you about it.


An Application And Its Data

Apart from the small matter of scale figuring out which data an application uses is the same problem as figuring out which data a job uses.

I think I’ll talk about DB2 in a later post, as this one has already become lengthy.

As you probably know there is lots of instrumentation on data sets in SMF. Without going into a lot of repetitive description:

  • You can get information about disk data sets from SMF 42 Subtype 6.
  • You can get information about VSAM data sets from SMF 62 (open) and SMF 64 (close).
  • For non-VSAM it’s SMF 14 (for read) and 15 (for update).

There are a number of lines of enquiry you might like to pursue, including:

  • Working out which data sets contribute most to the application’s processing time.

    Here you’d use SMF 42 and something like I/O number or (more usefully) I/O number times Response Time.

  • Figuring out which data sets are strongly related to this application and no other.

    In this case SMF 14, 15, 62 and 64 are needed. (You don’t need both 62 and 64 for the same data set.)

None of the above applies to DB2: You don’t get 14, 15, 62 or 64 for DB2 data (despite DB2 using Linear Data Sets, a form of VSAM). But there is useful work you can do on DB2 data classification. And that is the subject of the next post in this series.

DFSORT – Now With Extra Magical Ingredients

(Originally posted 2011–04–21.)

Thanks to Scott Drummond for reminding me of last Autumn’s DFSORT Function PTFs – UK90025 and UK90026. They’re mentioned in the preview for z/OS Release 13 so now is not such a bad time to be talking about them. So let me pick out a few highlights:



Translation Between ASCII And EBCDIC, And To And From Hex and Binary

For a long time DFSORT has been able to translate to upper case (TRAN=LTOU), to lower case (TRAN=UTOL) and using a table (TRAN=ALTSEQ).

Now you can convert from ASCII to EBCDIC (TRAN=ATOE) and back (TRAN=ETOA). Translation is performed using TCP/IP’s hardcoded translation table.

Other “utility” translations are added: BIT, UNBIT, HEX and UNHEX. For example TRAN=HEX would translate X’C1F1′ to C’C1F1′ and TRAN=UNBIT would
translate C’1100000111110001′ to X’C1F1′.



Date Field Arithmetic

DFSORT already had some nice functions for handling dates and times. But here are some new things. This isn’t an exhaustive list:

  • You can add years to a date field – with ADDYEARS.
  • You can subtract months from a date field – with SUBMONS.
  • You can calculate the difference between two dates – with DATEDIFF.
  • You can calculate the next Tuesday for a date field – with NEXTDTUE.
  • You can calculate the previous Wednesday for a date field – with PREVDWED.
  • You can calculate the last day of the quarter – with LASTDAYQ.


JCL Symbols In Control Statements

You can now construct Symbols incorporating JCL PROC or SET symbols. These can be used in DFSORT and ICETOOL control statements, just like other symbols. You specify this by coding JPn“&MYSYM” in the PARM parameter of the EXEC statement. (In fact there need be no JCL symbol in this so you could pass in other strings this way and an expected use is for JPn to contain a mixture of JCL Symbols and fixed text.) n can be any one of 0 through to 9.

This support is in addition to the ability to use System Symbols (introduced with UK90013).


Microseconds In Timestamps

You can use the new DATE5 keyword to create a timestamp constant at run-time in the form ‘yyyy-mm-dd-h.mm.ss.nnnnnn’. DB2 folks might recognise this as the timestamp format for DB2 Unload and DSNTIAUL. You can use this for things like comparisons.


Chunking And Stitching Together Records

You can use the new ICETOOL RESIZE operator to:

  • Split records into fixed-sized output records. For example, take a RECFM=FB, LRECL=500 file and create a RECFM=FB, LRECL=100 file – creating 5 new output records from each input record.
  • Join together fixed-sized input records. For example, take a RECFM=FB, LRECL=100 file and create a RECFM=FB, LRECL=500 file – in effect reversing the above by joining 5 input records together to make an output record.

In each case you can see there could be problems with partial output records. DFSORT “does the right thing” using blanks.


Begin Group When Key Changes

I mentioned WHEN=GROUP here, in particular BEGIN=. With BEGIN= you get a new group when the condition you specify is satisfied. Now, with KEYBEGIN= you get a new group when the value in a particular field changes. For example:



SORT FIELDS=(1,12,CH,A,13,8,CH,D)
OUTREC IFTHEN=(WHEN=GROUP,KEYBEGIN=(1,12),
PUSH=(13:13,8,31:ID=3))






sticks a group number (3 characters wide) on the end of each record. The group number is incremented when there’s a new value in the 12-byte field that begins in position 1.


There are lots of other (in my opinion) smaller enhancements in this PTF.

If you want to know whether the appropriate PTF is on look for the following message in a DFSORT run:



ICE201I H RECORD TYPE …



If you see an “H” then you’re all set.

And you can read about these enhancements in more detail in User Guide for DFSORT PTFs UK90025 and UK90026 (SORTUGPH).

HTML5 Up and Running – A Review Of Sorts

(Originally posted 2011–04–20.)

William Gibson’s “The future is already here β€” it’s just not very evenly distributed” applies very well to HTML5. It’s even more true of CSS3. Despite that (or maybe because of it) it’s a good time to dive into HTML5 – before everyone else does. πŸ™‚

So, a few months ago I bought, read and inwardly digested πŸ™‚ Mark Pilgrim’s HTML5 Up and Running, published by O’Reilly in August 2010. I have a rule of thumb: If a topic is covered by an O’Reilly book it’s probably ready for prime time. If it’s in “For Dummies” it’s probably too late. (With apologies to other similarly fine publishers and, of course, to the publishers of “For Dummies”.) It’s a glib rule but it’s mine. πŸ™‚

So how does this rule of thumb work out for HTML5? Well, if you make a “poor choice” of browser then not very well. “Poor choice” is in quotes because:

  • No one browser fully implements HTML5.

  • It’s not just about which browser but about which level.

  • You might be happy with one browser’s implementation but not another’s.

  • There’s a degree of ambiguity, development of the state of the art, etc about HTML5 itself. I’d characterise it as a (slowly) moving target.

And note that this book is from last year. So things will have changed. But:

  • It goes to great pains to describe the support by each browser.
  • It makes the point in doing so that browser support is variable and you soon get the drift as to what each browser’s maker’s attitude is to HTML5.

Having been around a bit I know something about technology adoption: 25 years ago it would take at least 18 months from a product’s announcement to it being implemented in most installations. I really don’t think anything much has changed. So you wouldn’t write code that depended on a feature your customers (or users) don’t have yet. Well actually you would: You’d just expect it to take a while for them to catch up. And you certainly wouldn’t make it immediately mandatory.

And so it is with HTML5. Mark helps out considerably with the “support of unsupportive browsers” issue by recommending the Modernizr HTML and CSS Feature Detection package. (You can follow them on Twitter: here) I’m sure there are other techniques for handling this but Mark’s right in pointing out this one hunts for capability rather than a named browser. Given some of the features of HTML5 are awfully similar to those provided by javascript frameworks such as Dojo, I’d not be surprised if these frameworks could be relied on in the future to do feature detection (simulating it if not present in the browser). Good examples of this are input elements in forms, much enhanced in HTML5.

At this point I’m reminded I haven’t outlined what’s in HTML5. So here’s a high-level list:

  • The Canvas drawing surface (which I have a “Production” application built around).

  • Offline applications (which I’ve experimented with).

  • Video (which looks a mess and set to remain one).

  • Geolocation (which I’ve not used in the HTML5 context but have, of course, in Social Networking and other applications).

  • Local Storage (a better replacement for cookies).

  • Forms enhancements (which look really nice). Rather than my attempting to create graphics of them see here for some input type examples.

  • Microdata (a replacement for the non-standard Microformats and the highly-incomprehensible RDFa – as a way of annotating parts of web pages with structured data).

So lots of really valuable things. But how does the book do?

Given the earliness of its publication I think it does very well. From the above I think you can see it pragmatically handles the issue of support – which is going to be key. It also describes each feature very well, with good clear examples. It also adds a historical backdrop – particularly when talking about how unknown elements are handled – so it gives you a good idea how we got here.

So, I think HTML5 is more than ready to be played with and this book is a very good one to get you started. (I’m assuming you’re not starting from a “zero knowledge of HTML” position.) It doesn’t tackle CSS3 and I’ve yet to find anything that does. When I find such a book I’ll probably buy it and review it here.

My Slides Are Ready For Vienna

(Originally posted 2011–04–19.)

I wouldn’t want you to decide not to come to Vienna, just because I’ve made my slides available on Slideshare:

  • I’d hope you’d come to Vienna anyway. It’s a great place and it’s going to be a great conference.
  • Lots of people can’t make Vienna and I don’t suppose having the slides to hand is going to tilt the playing field significantly away from coming if you weren’t going to anyway.

I don’t think there’s any prohibition in uploading slides: It’s more important the messages get out. So here are the four presentations I’m giving. If you DO spot errors please let me know. Thanks!

And, I know many of you will have seen these before. If you’ve not seen them recently I hope they’re sufficiently evolved since you last saw them. I do have plans to do brand new presentations this year. More on that, in due course, in this blog.

I’m A Sucker For Ingenuity

(Originally posted 2011-04-15.)

Every once in a while I come across a particularly good idea: Where someone has done something particularly clever to solve a problem. Here’s one very recent example:

On the iPhone (and iPod touches) you can only display icons. They can be augmented with a numeric counter, but that’s all. (The idea is the app shows you the number of unread emails, for example.)

But here’s the idea, in a nutshell: If you tried hard enough you could display any number you wanted (actually any integer).

Those clever people at International Travel Weather Calculator have produced a pair of iPhone apps – Celsius and Fahrenheit. (I guess they needed to create two for the hard-of-converting.) πŸ™‚

These two apps display the local (or remote) temperature permanently – without you having to open the app.

So they’ve subverted the counter – and that’s the clever part (weather apps being ten-a-penny).

(And this post is the first one I’ve created using Ecto. I mention it because it creates better HTML than the one built in to developerWorks and I hope it formats better. You can be the judge of that.)

Batch Architecture, Part One

(Originally posted 2011–04–12.)

First a word of thanks to Ferdy for his insightful comment on Batch Architecture, Part Zero. And also to my IBM colleague Torsten Michelmann for his offline note on the subject.

As I indicated in Part Zero I hoped to talk about jobs in a subsequent post. And this is that post. In particular I want to discuss

  1. Viewing jobs as part of distinct applications, and
  2. Generating a high-level understanding of individual jobs

Mostly I’m talking about using SMF 30 job-end records, but consider also:

  • SMF 30 step-end records.
  • SMF 16 DFSORT-invocation records (and, for balance, those for Syncsort).
  • SMF 101 DB2 Accounting Trace.
  • Scheduler Information.
  • Locally-held information about jobs.

(When I talk about jobs I’m aware there are other activities, not running as z/OS-based jobs. These include other actions on z/OS, such as automated operator actions, recovery actions, and jobs running on other platforms. In this post I’m more focusing on z/OS-based batch jobs.)


Grouping Jobs Into Applications
There are lots of ways of grouping jobs into applications…

Most installations claim a job naming convention. For example:

  • First character is “P” for “Production”, “D” for “Development” and “M” for “Miscellaneous”.
  • Second through fourth characters denote the application. (Maybe there’s a business area as the second character and the other two are applications within that.)
  • Last character denotes the frequency, e.g. “D” for “Daily”, “W” for “Weekly”, “M” for “Monthly”.
  • The remaining characters (often numeric) are an identifier within the application.

Sometimes I see naming conventions that are the other way round. I would recommend – if you have the choice – having this way round. So status and application are at the front. The reason I recommend this is it makes it much easier to code queries against any instrumentation – whether you’re using SAS, Tivoli Decision Support or the home-grown code I use. (If you’re merging batch portfolios and have to pick a naming convention this is the one I’d definitely go for.)

Your workload scheduler may well have different names for operations (in Tivoli Workload Scheduler parlance) so some care is required with those.

An interesting question is how well an installation observes their naming convention. As the old joke goes “we love naming conventions: We’ve got lots of them”. πŸ™‚ Analysis of SMF 30 should give you view of whether the naming convention is being observed.

As well as job names it’s sometimes interesting to see which userid(s) jobs are submitted under. Often Production batch is submitted from a single userid, according to Type 30. Similarly you can see which job class, WLM workload, service class and report class a job runs in.

Sometimes the programmer name field in Type 30 reveals application information.

Within a window it is occasionally the case that when a job runs is closely related to which application it’s in, though usually applications are intermingled in time – to some degree.

… And the above are just examples of characterisation information.


Understanding Individual Jobs – At A High Level

Whether you’ve grouped jobs into applications or are just looking at individual jobs it’s useful to characterise them. Typical characterisations include:

  • Whether jobs are long-running or short. Likewise CPU- or I/O-intensive.

  • Whether jobs are in-essence single step. (“In essence” alludes to the fact many jobs have small first and last steps, for management purposes.)

  • Whether jobs have in-line backups (the presence of e.g. IDCAMS steps being a good indicator).

  • How data sets are created and deleted for steps (e.g. IEFBR14 steps between processing steps).

  • Whether jobs use tape or do non-database I/O (visible in tape and disk EXCP counts).

  • Reliability statistics.

  • Use of DB2. (Slightly tricky for IMS DB2 jobs but still can be done.)

  • Clonedness.

  • Sort product usage.

The above are all discernable from job- and step-level information. At a slightly lower level (because it requires the use of data set OPEN information) is characterising the data access as being to VSAM, or BDAM, or QSAM/BSAM (or some combination thereof).

A lot of the characterisation of jobs is centred around standards. For example, how jobs are set up by the installation features heavily in the above list. Other sorts of standards can only be seen in things like JCL.

While the above obviously applies to individual jobs it can equally be applied to applications (as identified above) but it’s obviously a bit more work.


This post has talked about how to use instrumentation to group jobs into applications and the like. It’s also included some thoughts on how to characterise individual jobs and applications.

I hope in the next part to talk about relationships between applications. And to dive deeper into the application’s data.

IBM System z Technical University – Vienna, May 2-6

(Originally posted 2011–04–11.)

I’m working on my presentations for System z Technical University – Vienna, May 2–6 and I’m reviewing the agenda. As well as my four presentations there are lots of other goodies. These range from the Management level down to the purely technical. (I guess mine are towards the latter end of the scale – but I’d say there’s lots of pressure on us all to work on cost so detailed information on e.g. CPU has real impact.) In the other dimension there’s a very wide range of topics.

For the record I’m speaking on the following topics:

  • Memory Matters in 2011
  • Much Ado About CPU
  • Parallel Sysplex Performance
  • DB2 Data Sharing Performance For Beginners

These are all what I call “rolling” presentations: They evolve with time. If you haven’t seen them for a couple of years they’re substantially different. (Actually that’s probably true if it’s only been a year – as it will be for some of the luckier attendees.)

I’ll be a day late to the conference as I’m seeing Brian May and Kerry Ellis in concert at the Albert Hall the day before so won’t travel until the Monday. ( This concert is for a great cause: Leukemia and Lymphoma Research.)

I always enjoy these conferences: They’re generally in nice places but, more to the point, it’s great to run into old friends (customers, vendors and IBMers) and make new ones. And it’s always nice to hear things like “I saw you last year in Berlin and I’ll be in Vienna this year” (said by an Austrian customer back in February).

So, I think this conference is a great investment of time and money. And I feel very lucky to be attending yet again. See you there!

(Meanwhile I hope to be publishing my “Batch Architecture, Part One” post some time this week. I’m working on two batch situations that hopefully will inform the post, even if they delay it.)

Experimenting With QR Codes

(Originally posted 2011-04-04.)

Inspired by two of Bob Leah’s posts on QR Codeshere and here I started experimenting with creating and consuming QR codes.

But what is a QRΒ  code? In short it’s a two-dimensional barcode that can contain e.g plain text or a URL. In the latter case a QR code reader can pick up the URL – maybe from a real-world object – and open it in a browser.

Creating QR Codes

In my experiment I created the barcode differently from how Bob did: As my laptop is running Ubuntu Linux I looked for a command-line tool. In my case I used the qrencode package. This takes a string and encodes it as a PNG graphic. Here is an example:

This is rather small – which might be handy from the perspective of printing labels.

Command line is important to me because it means I could automate generating QR codes – maybe a page of labels at a time.

Reading QR Codes

On my iPhone I installed a nice QR Code reader app: qrafter (in fact the free version). Although the QR code above is rather small it could read it perfectly well. I’m sure there are QR code readers for all kinds of mobile devices. Nowadays anything with a camera can do all sorts of things like barcode reading, QR code reading, document scanning (with or without OCR).

Possibilities

The ultimate aim of the experiment is to be able to tag objects: If you can tolerate sticking a small QR code label on an object you can annotate it: You could stick a URL on the object and then your device of choice could read the URL and open the page in a browser.

But what could the URL be? In my imagination it could be in two parts:

  1. The URL points to a web server that maintains a database of information about objects. (In fact the URL points you to a page where you can view the information about the object – and optionally edit it.)
  2. The search string is the object number. Each QR code has a different number. Actually it need not be a number, strictly speaking.

Of course you COULD do this with RFID tags. But this seems to me a lighter-weight way to get started. Of course there are many objects you wouldn’t or couldn’t stick paper labels on: Such as clothing. But there are lots of things you could annotate this way.

There are lots of possibilities here. I was just experimenting – admittedly in my hotel room on a Sunday night. I’d be interested in ideas and thoughts on this.