Java’s Not The Only JVM-Based Language

(Originally posted 2011-12-18.)

JVM-based languages have an interesting property for z/OS programmers: They are zAAP-eligible.

As we all know zAAP Eligibility brings a number of benefits – including the licence charge benefits and the ability to run on a full-speed processor even when your general-purpose processors are subcapacity ones. (I’ll briefly mention zAAP-on-zIIP here for completeness, and then move on.)

You probably also know that recent zSeries and System z processor generations have very significantly boosted JVM performance. That’s a combination of JVM design improvements, processor speedups and JVM-friendly processor instructions. These are properties of the JVM and processor rather than the language or the javac compiler.

I’ve carefully avoided saying "Java" so far in this post, apart from obliquely in the previous sentence. That’s because this post explores the notion that anything that runs in the JVM can take advantage of all the above. Equally the usual considerations come into play – most notably native code (JNI) affecting eligibility and the startup cost for the JVM.

So what is a JVM? The acronym stands for "Java Virtual Machine". In reality it’s a bytecode interpreter – pure and simple. There’s nothing that says those bytecodes have to be created using the javac java compiler. Indeed there are a number of languages that create bytecode for the JVM.

Further, there are languages that are interpreted by java code – and hence also run in the JVM. My expectation is this would be slower than those that create bytecode. These languages include Javascript (via Rhino,as I mentioned here), Python via Jython, and Ruby via JRuby (which I personally haven’t explored yet).

And then there’s NetRexx. Which we’ll come to in a minute.

So, why the fascination with other JVM-friendly languages? First, when people talk about Modernisation on the mainframe there’s often a strong component of java in it. My take on Modernisation contains two elements I want to get across:

  • Java isn’t the only modern language. Indeed I’d hazard it wasn’t particularly modern. For fans of programming languages take a look at the languages I’ve already listed in this post. And this matters because people with enthusiasm and programming skill will often be conversant with these languages. Furthermore, lots of stuff I’d like to see run on the mainframe under z/OS is already available, written in these languages
  • As I, perhaps grumpily, state in discussions on e.g. Batch Modernisation, the point is to "kick the ball forward", whether that means java or not.

Β So, back to NetRexx. It’s not the only flavour of REXX available under z/OS Unix System Services. That much is well known. But it does run in the JVM – by compiling NetRexx programs to java source. This is different from the "bytecode" and "interpreted by a java" program approaches. The result is a java class or jar file, just as if you’d written it in java in the first place.

I uploaded the two necessary NetRexx jar files – from the distribution downloadable from here. These are NetRexxC.jar – the compiler – and NetRexxR.jar – the runtime. (I suspect you only really need NetRexxC.jar.) When you compile a NetRexx program you place NetRexxC.jar in you classpath and invoke java program org.netrexx.process.NetRexxC.

I wrote a simple NetRexx program – which uses (automatically imported) java classes: java.util.regex.Pattern and java.util.regex.Matcher. This program takes from the command line a search string, a replacement string, and a string to search-and-replace in. When I say "simple" the NetRexx program turns out to be much simpler, shorter and more understandable than the java equivalent. Here it is:

parse arg lookup replacement s
say Pattern.compile(lookup).matcher(s).replaceAll(replacement)

And that really is all there is to it. The "parse arg" and "say" instructions should look familiar to anyone who knows REXX. The rest is just stacked invocations of java classes.

As I read through the NetRexx language definition I could see a lot of advantages over traditional REXX. Two you’ve already seen – interoperability with java classes and running in the JVM (though the latter isn’t really a language definition benefit). Others included object orientation, more sophisticated switch statements, and "–" to start a comment on a line. So I think this is a better REXX. I noted only one incompatibility (and it might not even be incompatible): "loop" instead of "do" to start a loop.

Because Classic REXX can’t do Regular Expressions (though "parse" is nice) I experimented with invoking my NetRexx program (above) from classic REXX. I used BPXWUNIX (mentioned before here (and I’ve now corrected that post which incorrectly mentioned BPXWDYN instead) and in Hackday 9 – REXX, Java and XML ). Because the program uses the JVM I made sure to pass environment statements to BPXWUNIX setting up PATH for the JVM. This worked very well.

I could’ve used BPXWUNIX to call sed instead – for this use case. That probably would’ve been cheaper but I was proving I could call a NetRexx program from TSO REXX (in batch), passing in parameters and retrieving the result. Talking of "cheaper" I think it’s important to try and avoid transitions across the BPXWUNIX boundary: It’ll have a (non-zAAP) CPU cost and, if you’re using the JVM as NetRexx does, it’ll cost to set up the JVM and tear it down again afterwards. A pair of transitions with meaty application processing in between is going to be the most efficient.Β 

(The previous paragraph was conjectural: It would be nice to run some benchmarks on this one day. Anyone?)

So, I’m impressed (as you can tell) with NetRexx. I think it’s worth taking a look at – as indeed are the other JVM-based language implementations I mentioned. The point of this post is to demonstrate (yet again) there are choices – and considerations to go with them.

What’s In A Name?

(Originally posted 2011-12-16.)

This is the post I was going to write before the discussion that led to CICS VSAM Buffering arose. It’s about getting more insight into how WLM is set up and performing than RMF Workload Activity Report data alone allows.

I recognise some of this can be done with the WLM policy in hand. But this is about an SMF-based approach. (The piece you can’t do with SMF is discerning the WLM classification rules.) And the policy can’t answer questions about how systems actually behave.

There are two distinct problems I’ve worked on solving (relatively) recently. I share the outline of my solution to each of these with you here.

  • In RMF you can’t tell how Report Classes and Service Classes relate to each other: In some cases Report Classes break down Service Class data – often to the address space level. In some cases Report Classes coalesce information from multiple Service Classes. But you can’t see this linkage in RMF.
  • In RMF you can’t necessarily tell what runs in each Service Class. I say "necessarily" because you can tell some things about the nature of the work in a Service Class.

The "What’s In A Name?" in the title refers to the fact a Workload, Service Class or Report Class name is just a string of characters: Rhetorically it might be a "promise" but it’s not a mechanistic guarantee. So – to me at least – it’s worth knowing rather more.

Report And Service Class Relationships

SMF 72 Subtype 3 RMF Workload Activity Report data describes how Service Class Periods and Report Classes perform.

Type 30 Interval records (Subtypes 2 and 3) describe how address spaces perform.(Actually so do Subtypes 4 and 5, which are step-end and job-end records.) These records contain, amongst other things, WLM Workload, Service Class and Report Class names – for the address space. You can therefore use Type 30 to relate Workload and Service Class to Report Class. My code’s done this for some time.

Type 30 does not apply to Service Classes that don’t own address spaces. Two examples of this are DDF Transaction Service Classes and CICS Transaction Service Classes.

A related topic is which Service Classes are serving other Service Classes. For example CICS Region Services Classes and transaction Service Classes. Now this you can readily discern from SMF 72 alone. (And of course my code does that.)

What Work In A Service Class Is

(This piece relates equally to Report Classes.)

As I said, you can’t tell much about what a WLM Service Class covers from Type 72. So, as well as the correlation described above, my code uses Type 30 to flesh out what a Service Class is for. The key to this is the Program Name. For example CICS regions have PGM=DFHSIP. So a Service Class with just PGM=DFHSIPΒ  address spaces is just a CICS Region Service Class. Simple enough. Some are more complicated than others – perhaps necessitating the 16-character program name field which, for Unix, includes the last portion of the Unix program name.

You can play other games, too: The job name for a DB2 address space can be decoded to glean the subsystem it belongs to. Certain System address spaces have mnemonic Procedure names. And so on.Β 

From SMF 72 you can obtain the number of address spaces for a Service Class – 0 suggesting the Service Class doesn’t own any (see above). 1 suggests this class (possibly a Report Class) is there to provide more granularity. You can also get the number of address spaces In and the number Out-And-Ready. This can help you form a picture of e.g. "low use" address spaces in the Service Class.

This post is about sharing some of my experience of trying to extend the value that can be got out of SMF – beyond the obvious. Some of this will probably appear in my I Know What You Did Last Summer presentation – which I’m still hoping to complete soon. This also, by the way, explains why I’m so keen to get Type 30 data from you when you’re sending me RMF data. There really is a huge amount of value to be had.

CICS VSAM Buffering

(Originally posted 2011-12-16.)

Four score and seven years ago (or so it seems) πŸ™‚ the Washington Systems Center published a set of mainframe Data-In-Memory studies. These were conducted by performance teams in various IBM labs and were quite instructive and inspiring. I wish I could find the form number (and a fortiori a PDF version) for this book. Anyone? Even hardcopy would be really nice.

The reason I mention this is because of a thread in the CICS-L newsgroup overnight about the CPU impact of increasing the size of VSAM LSR buffers in CICS. I seem to recall that CICS / VSAM was one of the benchmarks written up in this orange book. The original poster wanted to know what the CPU impact profile was of increasing VSAM buffers. I think the study showed that there could be some CPU saving with bigger buffer pools. (Compare this with VIO in (then) expanded storage – which showed a net CPU increase for the technique.)

There are a number of points I would have raised in CICS-L but I’ll write them here instead – as most of you probably don’t read CICS-L:

  • I would not build a Data-In-Memory (DIM) case on CPU savings (though I would want to satisfy myself there wasn’t a significant net cost). I would build it on throughput enablement and response time decreases. This is true of any DIM technique.
  • The thread in CICS-L correctly identifies the need to be able to provision real memory to back the increase in virtual.
  • VSAM LSR buffers are allocated from within the virtual memory of the CICS address space. For most customers this isn’t an issue as the buffers are usually within 31-bit memory. (There is no 64-bit VSAM buffering.) But it’s still worth keeping an eye on CICS virtual storage (whether 31- or 24-bit) – perhaps using what’s in CICS SMF 110 Statistics Trace.
  • Back in the late 1980’s there was a tool – VLBPAA – that would analyse User F61 GTF Trace to establish the benefit of bigger buffer pools – at least in raw I/O reduction terms. The trace is still available and you could process it with DFSORT but it would be harder to predict buffering outcomes without VLBPAA. In fact I mention this in Memories of Batch LSR
  • One of the comments talked about hit ratios but I prefer to think of miss rates – or better still misses per transaction.

In general I find CICS VSAM LSR buffering insufficiently aggressive: As memory is generally plentiful these days (at least relative to the CICS VSAM LSR pool sizes I encounter) I think it’s appropriate for installations to consider big increases (subject to the provisoes above). Think in terms of doubling rather than adding 20%. And no 10MB total buffering is not aggressive. πŸ™‚

DB2 Accounting Trace And Unicode

(Originally posted 2011-12-12.)

As I said in this post I recently came across the need to handle Unicode when processing DB2 Accounting Trace (SMF 101). I was astonished not to have run into it before in all my many sets of customer data. So I had two things to do:

  • Understand the circumstances under which it happens – which isn’t just "be on Version 8 and it will happen automatically."

and

  • Figure out how to handle it when I see it. (i.e. when QWHSFLAG has the value x’80’ as I mentioned in the other post).

As you’d expect, I asked the customer what they had done to cause the generation of 101 records containing Unicode fields. The answer is that they’ve set parameter UIFCIDS in DSNZPARM to "YES". It turns out my friend Willie Favero had mentioned it in this blog post some time ago. Because the DB2 Catalog has Unicode in it in Version 8 it actually takes cycles to create 101 records without Unicode in: All the fields marked "%U" in the mappings in SDSNMACS have to be translated from Unicode to EBCDIC. If you code "UIFCIDS=YES" you avoid the cost of the translation.

But there’s an obvious downside: Any reporting against those fields (the ones marked "%U") needs to take that into account. But if you never (or rarely) look at e.g. the Package-level stuff you might prefer to write it in Unicode (or suppress IFCID 239 entirely). It’s probably a net saving in CPU, albeit a small one.

Which leads on to the second part of this: How did I handle the translation into something readable (EBCDIC being my primary encoding, at least on z/OS)? My interim take is to fix up my reporting REXX to check QWHSFLAGS and do the right thing. You can readily do that with the built-in TRANSLATE instruction. There is code knocking around on the Internet for the purpose. That got me through this study and I have reusable code I can use wherever I need to do the translation.

But this is not the only way, and perhaps not the best: My code reformats records (for historical reasons, mainly) in an assembler exit – as part of my database build process. It’s entirely feasible to do the translation there and then my database has everything readable. To do that you can use the TR (Translate) instruction. Of course you can use the same translation table (albeit with different syntax – one or so edit away) as in the REXX. But that’s a whole load more effort and potential fragility. I think I’ll defer that.

One other thing: Unicode isn’t necessarily 1-byte characters. But the sample I have is. Neither REXX TRANSLATE BIF nor TR instruction will handle multi-byte characters. So I could eventually come unstuck. And, no, I don’t know if these fields will contain multi-byte characters any time soon. Anyone in a position to comment?

| Fixed a glitch in the bulleted list at the top.Β 

CICS and Batch

(Originally posted 2011-12-09.)

In my experience there are two kinds of CICS installations: Those that take CICS down at night – to run the Batch – and those that don’t.

There is a loose correlation between what the data manager is and which approach is taken: VSAM-based CICS applications tend to be less 24×7 than DB2 ones, though it’s not that clear cut.

This post is about how you (really I) might glean how you run CICS vis-a-vis Batch, using SMF. Even if you know the principle of how you manage CICS regions the numbers should still be useful and not too onerous.

If you’re the kind of installation that takes CICS down the SMF 30 Job-End records (Subtype 4) will tell you when CICS started and stopped. (If you want to know when CICS came down in an unscheduled manner the same data applies.)

If you don’t expect CICS to come down often the SMF 30 Interval Records (Subtypes 2 and 3) will confirm the region is still up.

(The (ancient) SMF processor I use – SLR – has an Availability Reporting capability. I would expect and hope other tools have something similar. In any case it’s just another take on regular performance data. I’m considering playing with SLR Availability Reporting.)

For the purposes of this discussion I consider any address space running program "DFHSIP" to be a CICS region. There may be other program names of relevance, too.

My preferred means of display is a Gantt chart – my (also ancient) formatting tool – Bookmaster – doing very nicely in that regard. I got into Gantt charts for Batch Suite display but the technique is fine for online regions. I’m beginning to annotate Gantt charts with commentary – a technique you may find useful. (I might post soon on how I’m doing the annotation – as that has been an interesting side project.)

I was reminded by a couple of recent current customer studies of some of the reasons Batch and CICS often don’t run alongside each other. Sometimes it’s logical "end of day" quiesce points (or "Positions") and sometimes it’s to ensure CICS and Batch don’t compete for resources. (More than half the customers I know have a higher Batch Window CPU Utilisation (if you squint at it) than that of their online day.) The "end of day" reason often shows up as CICS closing data sets and batch jobs opening them (and conversely at the other end of the night). I saw this in the study I’m currently finishing off. As many of you know I use the Life Of A Data Set (LOADS) technique – and this time I saw CICS regions as well as batch jobs. I think it would be useful to see how long after CICS comes down the first batch job processes any of the region’s data. And the same at the other end.

It’s much more difficult to see the DB2 objects – table spaces and index spaces – accessed by CICS regions and batch jobs. But then, as I said, overlap is more common here. It raises the interesting question of how an installation knows it’s safe to run CICS and the related batch concurrently. Maybe you can share your experiences. I think it has something to do with Applications people (shudder) designing things. πŸ™‚

The same applies to MQ queues, of course. And IMS is a whole other game.

So we can use SMF 30 to document uptime for CICS. I think it would be useful also to form a view of what the transaction profile looks like while the regions are up. This would probably be driven by CICS SMF 110 records, possibly pulling in correlated DB2 SMF 101 and MQ SMF 116 records. There are two reasons to do this:

  • Effective outages – where the region is up but work still can’t get done could be documented. (A healthy transaction rate suggests a healthy region.) Actually a spate of "unhappy ending" transactions might mean something – such as the loss of the database manager or some partner region.
  • It would be interesting to see how transactions peter out towards the end of the day (if they do) and perhaps "peter in" πŸ™‚ at the beginning. You might use this to tell you CICS could afford to be up for less time (to make room for the growing Batch). I’d prefer to think of it conversely: Justifying the need to keep regions up as late as you do and starting them as early as you do. Taken to its logical conclusion it might justify a project to make the CICS and Batch run concurrently.

By the way everything I’ve said above (other than the specific program name) applies to most of the other online application styles. I’m just reminded of CICS, as I say, by a couple of current customer engagements. Undoubtedly the next one will remind me of something else. πŸ™‚

DB2 Package-Level Statistics and Batch Tuning

(Originally posted 2011-12-05.)

I don’t know how many years it’s been since DB2 Version 8 was shipped but I’ve FINALLY added support for some really useful statistics that became available with that release.

As so often happens I was caused to open up my code because of some customer data that exposed a problem in it: The customer sent DB2 Version 8 SMF 101 Accounting Trace data that contained Unicode. In particular DB2 Package names were showing up as apparent garbage. Hexdumping some records showed this field to be Unicode.

The first step was to tolerate Unicode. In my case I translate it in the REXX I do my actual reporting in. (I could’ve done it in the assembler and maybe one day I will – but it makes the job more complex.) There is a field in the Product Section (QWHSFLAG) that has the value x’80’ if the record contains Unicode (and 0 if it doesn’t).

But this post isn’t really about Unicode. It isn’t about the longer names that are supported in Version 8, either. It’s about some nice "new" statistics you also get at the Package level. (And nothing significant has happened to the 101 record since Version 8.) As I had the code open I took the opportunity to exploit these new numbers. I’ve not written about themΒ  before so now is a good time to extol their virtues – despite the arrival of Versions 9Β  and 10.

So this post is about DB2 performance at the package or program level. "Program" would be the application code or not-specifically-DB2 term: An application program calling DB2 generally uses a package with the same name. I’ll use package in the rest of this post because it’s the DB2 term.

Buffer Pool Statistics

The Accounting Trace record has very nice buffer pool statistics at the Plan / Buffer Pool level. But the real problem for a batch job is "which program / DB2 package is driving the traffic?" We’ve always had the ability to say which packages the time was spent in and which components of response time for those packages are dominant. And indeed the major packages might be spending lots of time waiting for Synchronous Buffer Pool I/O or Read (or Write) Asynchronous I/O. (I see that quite often.)

What we didn’t know until Version 8 is how the buffer pools are performing for those top packages. So these statistics are really handy.

Note: There’s only one set of buffer pool statistics for each package. That is, you can’t tell which buffer pools are accessed by which package.

SQL Statistics

At the plan level we see the number of, for example, singleton selects, cursor opens, fetches under cursor, updates, inserts and deletes. So we might, for instance, gain some insight into why a batch job step is seeing a large amount of Synchronous Database I/O Time: Perhaps it’s because of a plethora of singleton selects so Prefetch doesn’t really happen.

What we couldn’t do, prior to Version 8, is see this at the package level. Now we can we’re able to find the package / program that’s behaving this way. So we stand a better chance of fixing it.

As an experiment I summed up counts of the different SQL statement at the package level and compared it to the field QPACSQLC (which was there at the package level long before Version 8). This field is the number of SQL statements. Usually they’re the same but in a significant proportion of cases the sum is less than QPACSQLC. One valid explanation is that the difference includes commits and aborts (which there aren’t statistics for at the package level). I bolded "includes" because this isn’t the whole explanation. If you take out the plan-level commit and abort counts (fields QWACCOMM and QWACABRT) you sometimes still have a discrepancy. I’ll have to research why this might be.

But, as I say, the reason for this post about old (but not obsolete) statistics is that I think y’all will find them really handy. And especially for batch steps and indeed any DB2 application where the transaction comprises multiple packages / programs.Β 

Rhino’s

(Originally posted 2011-11-20.)

The very first computer game I ever played was called Rhino and it ran on a Commodore PET. The school had been lent one for a fortnight. (I don’t know why as we didn’t go on to buy any, instead getting a single RML 380Z.) Imagine a character-grid screen where the rhino’s are represented by pi symbols that chase you as you try to move from A to B. It was written in BASIC and allowed from 1 to 10 rhino’s. (We modified it so it would allow up to 100: It played much slower but the overwhelming number of pi signs was good for a 5-minute giggle.)

Having got that off my chest normal service is resumed: πŸ™‚

Today I want to talk to you about a different Rhino. It’s a javascript interpreter written in java. And it’s documented here.

Some of you will know that I don’t think java is a particularly modern language. If you’ve kept your ear to the ground you’ll know there are MANY that have a better claim to the mantle of "coolest language (today)". πŸ™‚ One of the nice things about it, though, is that it is zAAP-eligible. And a couple of nice things about javascript is that a lot of people know it, and it’s got some really neat features.

Here’s a sample that shows one of its capabilities – processing XML:


xml=new XML(stdinText)
print(xml.item.(@type=="carrot").@quantity)

These two lines parse a text string (in the variable "stdinText") as XML and print the value of the quantity attribute of each XML node that has "carrot" as the value of its type attribute. If you know XML I think you’ll see the power of that.

For completeness here is the XML and the whole code to read the XML string from stdin and do the XML query:

First the XML:


<sales vendor="John">
Β  <item price="4" quantity="6" type="peas"/>
Β  <item price="3" quantity="10" type="carrot"/>
Β  <item price="5" quantity="3" type="chips"/>
</sales>

and now the javascript:


importPackage(java.io)
importPackage(java.lang) 
stdinText=""
stdin=new BufferedReader(new InputStreamReader(System["in"])) 
while(stdin.ready()){
Β  stdinText+=stdin.readLine()
}
xml=new XML(stdinText)
print(xml.item.(@type=="carrot").@quantity)

In this version of the code two java packages are imported that are used to get the XML from stdin. I chose to do this so I could indirect from a file – and also from BPXWUNIX (the REXX interface to z/OS Unix I’ve mentioned several times recently here).

The "new XML()" phrasing is part of the javascript’s E4X capabilities – that allow it to process XML so neatly. (Another part is directly assigning literal XML (not a string) to a javascript variable. See ECMAScript for XML for a description of E4X.

One of the nice things about javascript is the proliferation of frameworks for it. My favourite is Dojo though I’ve also used jQuery. Although Dojo (and jQuery) have a lot to offer for web applications much of that isn’t applicable to z/OS Unix System Services. But some very useful things remain. For example some language extensions like "dojo.forEach()" iteration.

So I set out to try two things:

  1. Installing and using Rhino. You’ve seen an example of this above – processing XML.
  2. Installing and using the Base component of Dojo.

The latter proved interesting: I made a bit of a mess of it by copying the Dojo Base files up to z/OS one at a time. I should’ve done it wholesale. But I got there in the end. And here’s a test script:


load("dojo.js")
dojo.forEach([1,2,3],function(x) {print(x)})
print(dojo.isRhino)

which does the following:

  1. It pulls in the Dojo code.
  2. It creates a temporary array – [1,2,3] and iterates over it – printing 1 then 2 then 3.
  3. It prints "true" because Dojo is indeed running under Rhino.

So, it was quite easy to get the Rhino javascript interpreter up and running – under java – on z/OS. It would’ve been easy to install Dojo Base if I hadn’t made a mess of it. I think if you have people used to javascript they could be productive quite quickly – so long as you’re comfortable with a java-based environment. As previously mentioned, you could readily call javascript code this way from REXX with BPXWUNIX. Indeed you could have REXX pass the actual javascript code in.

On the performance side you need to be aware that Rhino converts the javascript code to java classes on the fly. It’ll probably perform well after the initial conversion – so stuff with lots of iteration. And I would expect it to offload to a zAAP (or zIIP via zAAP-on-zIIP) quite readily. (I’d love to see measurements if anyone tries it.)

As well as Rhino, Mozilla make a C-based javascript interpreter (SpiderMonkey) but I think that would be a much more difficult thing to get running on z/OS.Β 

And if you want an exemplar of how far games have come from the original Rhino try Uncharted 3. I’ve just completed in and it’s outstanding. (So outstanding I want to go back and play the whole thing again, as well as dipping my toe in the multiplayer water.)

| Corrected to read "BPXWUNIX" instead of "BPXWDYN".

Tivoli Workload Scheduler and Workload Manager Service Classes

(Originally posted 2011-11-04.)

I don’t think I’ve mentioned this before in this blog but Tivoli Workload Scheduler (TWS) has a nice "WLM Integration" feature. With it TWS can change the Service Class a job runs in – before submission.

The main purpose of this is to elevate Critical Path work.

We wrote about this in the Redbook mentioned in Touring The Upcoming Batch Optimization On z/OS Redbook, and so some of you will have seen me begin to discuss the area in the presentation.

This week Dean Harrison (a friend who is a renowned expert on TWS) and I presented together on "TWS and WLM". It’s all about this linkage. He talked about how you goad TWS into changing the WLM Service Class for a job. Very sensibly he didn’t say "a better Service Class" but rather "a different Service Class". I was watching for him to trip up but he didn’t. πŸ™‚

(He also did a nice job of handling the whole fraught question of what a Critical Path actually is. I think we all know the score here.)

But back to the "different Service Class" thing: We all know that it doesn’t matter what you call a Service Class, it’s what the goal is that counts. In my portion of the presentation I did the "now you’ve got a Service Class now what?" piece. So I talked, for instance, about whether response time goals were usable with Production Batch or whether you should stick to velocity goals. (On that one I think only if the batch looks like a large number of independent really heavy transactions should you use response time goals.)

So, my role was in the vein of the cliche "what you don’t know can hurt you" for Batch Operations folks. My message to them was "understand what the service classes you’re steering work towards actually mean".

But let’s stand it on its head: From the Performance Analyst’s perspective the cliche still applies:Β  How do you know what the batch is that shows up in PRDBATHI vs PRDBATMD? You don’t unless you take an interest in how the jobs got into either of those classes. So my message to "us" is "understand how work came to be in each Service Class and what expectations the Batch Operations folks have of it".

The "can hurt you" in the cliche is interesting: You could – whether a Batch Operations person or a Performance Analyst hide behind "separation of concerns". I wouldn’t recommend it though: Presumably installations take a dim view of such things. Another cliche on offer is "hang together or hang separately". I encourage both camps to work together. (Just as I encourage eg CICS and Performance people to.)

Best of all is if you can be skilled in bothΒ  TWS and WLM. A tall order, I know. But that might earn you a leading role in the (proposed in the Redbook) Batch Design Authority.

SYSIN In A Proc – New With z/OS Release 13 JES2

(Originally posted 2011-10-25.)

One of the nice enhancements in z/OS Release 13 JES2 was the support for SYSIN in a JCL procedure. (See here for the announcement letter.)

I have a personal example of where it would’ve been handy. You probably have your own.

We used to distribute sample JCL to use DB2 DSNTIAUL to unload the DB2 Catalog, one table at a time. Obviously with such a repetitive use in a single job step you’d want to use a JCL procedure. And so we did. But there’s a snag:

Each invocation of DSNTIAUL needed the same SYSTSIN DD data. The example below is for DB2 Version 8.


RUN PROGRAM(DSNTIAUL) PLAN(DSNTIB81) PARMS('SQL')
END

You can see that for Version 9 you’d need to change it to DSNTIB91 and anyway neither the program nor the plan name are fixed – as DSNTIAUL is a sample you compile etc yourself. So if you change it you either change it in the PROC or you don’t have one and make a global change.

Changing it in the PROC would be preferable, of course. But it couldn’t be done. So we added a step to copy these two lines to a temporary data set and refer to that from the PROC. Cumbersome but a well-known circumvention.

Enter Release 13. The need to do this has gone as you can now have SYSIN instream in a PROC.

Of course, if I were maintaining this JCL still I wouldn’t rush to publish a version that assumed Release 13. πŸ™‚ I hope in your shop you can use it sooner than I can.

A Small Step For RMF, A Giant Leap For Self-Documenting Systems

(Originally posted 2011-10-20.)

I mentioned APAR OA21140 before – here. It’s quite an old APAR and so it will (in all likelihood) be on your systems.

I’d like to draw your attention to a subtle 1-byte field in the SMF 74 Subtype 4 (Coupling Facility Activity) record: R744FLPN. It’s the partition number for the coupling facility. (If you’ve seen any one of several of my presentations I’ll’ve talked about this field.)

Here’s a problem I have but most of you don’t: I try to build a picture of your systems just from SMF / RMF data. (Actually I’ve run into lots of customers who have almost the same problem. From a "systems should be self-documenting" standpoint, if nothing else.) Let’s review what we have:

  • SMF 70 Subtype 1 gives you lots of information about a machine, including the serial number and all the LPARs – all in gory detail.
  • SMF 74 Subtype 4 gives you lots of Coupling Facility information – again in gory detail.

It’s been my contention that if you put these two together good stuff can happen:

  • You can get a true picture of the CPU performance of coupling facility, especially in the shared ICF engines case.
  • You can match the CF links to the CHPIDs from SMF Type 73 (not that the latter will give you much information).
  • You can identify what each of those LPARs that show up using resources in the ICF engine pool actually are.
  • In principle I can identify free CF memory that could be redeployed to other LPARs as needed.

Maybe the first of these is the most significant but, as you know, I like to show up in a customer having got quite close to their systems – not asking the questions I should’ve got the answers to from the data. So the next two are nice as well. I consider the fourth "unfinished business" as we don’t have a complete picture of machine memory but it’s still valid.

Today I finally exploited R744FLPN to do the matching. So you might like to, too. I could even define a view across SMF 70 and 74-4 with a full set of matching keys now. Whoo hoo! πŸ™‚

A long time ago I encountered a customer with multiple coupling facility LPARs whose names didn’t match the coupling facility names. So I wrote some code that only worked in the "1 coupling facility on a machine" case: Fairly trivial code.

Then RMF added machine serial number, the number of dedicated and shared engines and the LPAR weights to the 74-4 record. This helped with a customer case where every CF LPAR had the same name – across multiple machines. (Why do people do that?) πŸ™‚

But in asking them to do that I forgot one other thing: LPAR Number. So now OA21140 provides that and we can do a direct match (where the code doesn’t have to do lots of special-case detective work).

As I said, this APAR is quite old. Usually I like to take advantage of new data as soon as a customer sends me it. But I’ve been busy these past 2 years like never before. (You might’ve noticed that.) πŸ™‚ So, finally I have the change in my code in to take advantage of that. (And I still have the other code in place as a fallback.) It turned out to be a tiny change – that took me 15 minutes to code and about the same to test. You’d be amazed at what can get done in "interstitial time". πŸ™‚

All in a day’s work for a Principal Systems Investigator (PSI). πŸ™‚Β