CICS VSAM Buffering

(Originally posted 2011-12-16.)

Four score and seven years ago (or so it seems) πŸ™‚ the Washington Systems Center published a set of mainframe Data-In-Memory studies. These were conducted by performance teams in various IBM labs and were quite instructive and inspiring. I wish I could find the form number (and a fortiori a PDF version) for this book. Anyone? Even hardcopy would be really nice.

The reason I mention this is because of a thread in the CICS-L newsgroup overnight about the CPU impact of increasing the size of VSAM LSR buffers in CICS. I seem to recall that CICS / VSAM was one of the benchmarks written up in this orange book. The original poster wanted to know what the CPU impact profile was of increasing VSAM buffers. I think the study showed that there could be some CPU saving with bigger buffer pools. (Compare this with VIO in (then) expanded storage – which showed a net CPU increase for the technique.)

There are a number of points I would have raised in CICS-L but I’ll write them here instead – as most of you probably don’t read CICS-L:

  • I would not build a Data-In-Memory (DIM) case on CPU savings (though I would want to satisfy myself there wasn’t a significant net cost). I would build it on throughput enablement and response time decreases. This is true of any DIM technique.
  • The thread in CICS-L correctly identifies the need to be able to provision real memory to back the increase in virtual.
  • VSAM LSR buffers are allocated from within the virtual memory of the CICS address space. For most customers this isn’t an issue as the buffers are usually within 31-bit memory. (There is no 64-bit VSAM buffering.) But it’s still worth keeping an eye on CICS virtual storage (whether 31- or 24-bit) – perhaps using what’s in CICS SMF 110 Statistics Trace.
  • Back in the late 1980’s there was a tool – VLBPAA – that would analyse User F61 GTF Trace to establish the benefit of bigger buffer pools – at least in raw I/O reduction terms. The trace is still available and you could process it with DFSORT but it would be harder to predict buffering outcomes without VLBPAA. In fact I mention this in Memories of Batch LSR
  • One of the comments talked about hit ratios but I prefer to think of miss rates – or better still misses per transaction.

In general I find CICS VSAM LSR buffering insufficiently aggressive: As memory is generally plentiful these days (at least relative to the CICS VSAM LSR pool sizes I encounter) I think it’s appropriate for installations to consider big increases (subject to the provisoes above). Think in terms of doubling rather than adding 20%. And no 10MB total buffering is not aggressive. πŸ™‚

DB2 Accounting Trace And Unicode

(Originally posted 2011-12-12.)

As I said in this post I recently came across the need to handle Unicode when processing DB2 Accounting Trace (SMF 101). I was astonished not to have run into it before in all my many sets of customer data. So I had two things to do:

  • Understand the circumstances under which it happens – which isn’t just "be on Version 8 and it will happen automatically."

and

  • Figure out how to handle it when I see it. (i.e. when QWHSFLAG has the value x’80’ as I mentioned in the other post).

As you’d expect, I asked the customer what they had done to cause the generation of 101 records containing Unicode fields. The answer is that they’ve set parameter UIFCIDS in DSNZPARM to "YES". It turns out my friend Willie Favero had mentioned it in this blog post some time ago. Because the DB2 Catalog has Unicode in it in Version 8 it actually takes cycles to create 101 records without Unicode in: All the fields marked "%U" in the mappings in SDSNMACS have to be translated from Unicode to EBCDIC. If you code "UIFCIDS=YES" you avoid the cost of the translation.

But there’s an obvious downside: Any reporting against those fields (the ones marked "%U") needs to take that into account. But if you never (or rarely) look at e.g. the Package-level stuff you might prefer to write it in Unicode (or suppress IFCID 239 entirely). It’s probably a net saving in CPU, albeit a small one.

Which leads on to the second part of this: How did I handle the translation into something readable (EBCDIC being my primary encoding, at least on z/OS)? My interim take is to fix up my reporting REXX to check QWHSFLAGS and do the right thing. You can readily do that with the built-in TRANSLATE instruction. There is code knocking around on the Internet for the purpose. That got me through this study and I have reusable code I can use wherever I need to do the translation.

But this is not the only way, and perhaps not the best: My code reformats records (for historical reasons, mainly) in an assembler exit – as part of my database build process. It’s entirely feasible to do the translation there and then my database has everything readable. To do that you can use the TR (Translate) instruction. Of course you can use the same translation table (albeit with different syntax – one or so edit away) as in the REXX. But that’s a whole load more effort and potential fragility. I think I’ll defer that.

One other thing: Unicode isn’t necessarily 1-byte characters. But the sample I have is. Neither REXX TRANSLATE BIF nor TR instruction will handle multi-byte characters. So I could eventually come unstuck. And, no, I don’t know if these fields will contain multi-byte characters any time soon. Anyone in a position to comment?

| Fixed a glitch in the bulleted list at the top.Β 

CICS and Batch

(Originally posted 2011-12-09.)

In my experience there are two kinds of CICS installations: Those that take CICS down at night – to run the Batch – and those that don’t.

There is a loose correlation between what the data manager is and which approach is taken: VSAM-based CICS applications tend to be less 24×7 than DB2 ones, though it’s not that clear cut.

This post is about how you (really I) might glean how you run CICS vis-a-vis Batch, using SMF. Even if you know the principle of how you manage CICS regions the numbers should still be useful and not too onerous.

If you’re the kind of installation that takes CICS down the SMF 30 Job-End records (Subtype 4) will tell you when CICS started and stopped. (If you want to know when CICS came down in an unscheduled manner the same data applies.)

If you don’t expect CICS to come down often the SMF 30 Interval Records (Subtypes 2 and 3) will confirm the region is still up.

(The (ancient) SMF processor I use – SLR – has an Availability Reporting capability. I would expect and hope other tools have something similar. In any case it’s just another take on regular performance data. I’m considering playing with SLR Availability Reporting.)

For the purposes of this discussion I consider any address space running program "DFHSIP" to be a CICS region. There may be other program names of relevance, too.

My preferred means of display is a Gantt chart – my (also ancient) formatting tool – Bookmaster – doing very nicely in that regard. I got into Gantt charts for Batch Suite display but the technique is fine for online regions. I’m beginning to annotate Gantt charts with commentary – a technique you may find useful. (I might post soon on how I’m doing the annotation – as that has been an interesting side project.)

I was reminded by a couple of recent current customer studies of some of the reasons Batch and CICS often don’t run alongside each other. Sometimes it’s logical "end of day" quiesce points (or "Positions") and sometimes it’s to ensure CICS and Batch don’t compete for resources. (More than half the customers I know have a higher Batch Window CPU Utilisation (if you squint at it) than that of their online day.) The "end of day" reason often shows up as CICS closing data sets and batch jobs opening them (and conversely at the other end of the night). I saw this in the study I’m currently finishing off. As many of you know I use the Life Of A Data Set (LOADS) technique – and this time I saw CICS regions as well as batch jobs. I think it would be useful to see how long after CICS comes down the first batch job processes any of the region’s data. And the same at the other end.

It’s much more difficult to see the DB2 objects – table spaces and index spaces – accessed by CICS regions and batch jobs. But then, as I said, overlap is more common here. It raises the interesting question of how an installation knows it’s safe to run CICS and the related batch concurrently. Maybe you can share your experiences. I think it has something to do with Applications people (shudder) designing things. πŸ™‚

The same applies to MQ queues, of course. And IMS is a whole other game.

So we can use SMF 30 to document uptime for CICS. I think it would be useful also to form a view of what the transaction profile looks like while the regions are up. This would probably be driven by CICS SMF 110 records, possibly pulling in correlated DB2 SMF 101 and MQ SMF 116 records. There are two reasons to do this:

  • Effective outages – where the region is up but work still can’t get done could be documented. (A healthy transaction rate suggests a healthy region.) Actually a spate of "unhappy ending" transactions might mean something – such as the loss of the database manager or some partner region.
  • It would be interesting to see how transactions peter out towards the end of the day (if they do) and perhaps "peter in" πŸ™‚ at the beginning. You might use this to tell you CICS could afford to be up for less time (to make room for the growing Batch). I’d prefer to think of it conversely: Justifying the need to keep regions up as late as you do and starting them as early as you do. Taken to its logical conclusion it might justify a project to make the CICS and Batch run concurrently.

By the way everything I’ve said above (other than the specific program name) applies to most of the other online application styles. I’m just reminded of CICS, as I say, by a couple of current customer engagements. Undoubtedly the next one will remind me of something else. πŸ™‚

DB2 Package-Level Statistics and Batch Tuning

(Originally posted 2011-12-05.)

I don’t know how many years it’s been since DB2 Version 8 was shipped but I’ve FINALLY added support for some really useful statistics that became available with that release.

As so often happens I was caused to open up my code because of some customer data that exposed a problem in it: The customer sent DB2 Version 8 SMF 101 Accounting Trace data that contained Unicode. In particular DB2 Package names were showing up as apparent garbage. Hexdumping some records showed this field to be Unicode.

The first step was to tolerate Unicode. In my case I translate it in the REXX I do my actual reporting in. (I could’ve done it in the assembler and maybe one day I will – but it makes the job more complex.) There is a field in the Product Section (QWHSFLAG) that has the value x’80’ if the record contains Unicode (and 0 if it doesn’t).

But this post isn’t really about Unicode. It isn’t about the longer names that are supported in Version 8, either. It’s about some nice "new" statistics you also get at the Package level. (And nothing significant has happened to the 101 record since Version 8.) As I had the code open I took the opportunity to exploit these new numbers. I’ve not written about themΒ  before so now is a good time to extol their virtues – despite the arrival of Versions 9Β  and 10.

So this post is about DB2 performance at the package or program level. "Program" would be the application code or not-specifically-DB2 term: An application program calling DB2 generally uses a package with the same name. I’ll use package in the rest of this post because it’s the DB2 term.

Buffer Pool Statistics

The Accounting Trace record has very nice buffer pool statistics at the Plan / Buffer Pool level. But the real problem for a batch job is "which program / DB2 package is driving the traffic?" We’ve always had the ability to say which packages the time was spent in and which components of response time for those packages are dominant. And indeed the major packages might be spending lots of time waiting for Synchronous Buffer Pool I/O or Read (or Write) Asynchronous I/O. (I see that quite often.)

What we didn’t know until Version 8 is how the buffer pools are performing for those top packages. So these statistics are really handy.

Note: There’s only one set of buffer pool statistics for each package. That is, you can’t tell which buffer pools are accessed by which package.

SQL Statistics

At the plan level we see the number of, for example, singleton selects, cursor opens, fetches under cursor, updates, inserts and deletes. So we might, for instance, gain some insight into why a batch job step is seeing a large amount of Synchronous Database I/O Time: Perhaps it’s because of a plethora of singleton selects so Prefetch doesn’t really happen.

What we couldn’t do, prior to Version 8, is see this at the package level. Now we can we’re able to find the package / program that’s behaving this way. So we stand a better chance of fixing it.

As an experiment I summed up counts of the different SQL statement at the package level and compared it to the field QPACSQLC (which was there at the package level long before Version 8). This field is the number of SQL statements. Usually they’re the same but in a significant proportion of cases the sum is less than QPACSQLC. One valid explanation is that the difference includes commits and aborts (which there aren’t statistics for at the package level). I bolded "includes" because this isn’t the whole explanation. If you take out the plan-level commit and abort counts (fields QWACCOMM and QWACABRT) you sometimes still have a discrepancy. I’ll have to research why this might be.

But, as I say, the reason for this post about old (but not obsolete) statistics is that I think y’all will find them really handy. And especially for batch steps and indeed any DB2 application where the transaction comprises multiple packages / programs.Β 

Rhino’s

(Originally posted 2011-11-20.)

The very first computer game I ever played was called Rhino and it ran on a Commodore PET. The school had been lent one for a fortnight. (I don’t know why as we didn’t go on to buy any, instead getting a single RML 380Z.) Imagine a character-grid screen where the rhino’s are represented by pi symbols that chase you as you try to move from A to B. It was written in BASIC and allowed from 1 to 10 rhino’s. (We modified it so it would allow up to 100: It played much slower but the overwhelming number of pi signs was good for a 5-minute giggle.)

Having got that off my chest normal service is resumed: πŸ™‚

Today I want to talk to you about a different Rhino. It’s a javascript interpreter written in java. And it’s documented here.

Some of you will know that I don’t think java is a particularly modern language. If you’ve kept your ear to the ground you’ll know there are MANY that have a better claim to the mantle of "coolest language (today)". πŸ™‚ One of the nice things about it, though, is that it is zAAP-eligible. And a couple of nice things about javascript is that a lot of people know it, and it’s got some really neat features.

Here’s a sample that shows one of its capabilities – processing XML:


xml=new XML(stdinText)
print(xml.item.(@type=="carrot").@quantity)

These two lines parse a text string (in the variable "stdinText") as XML and print the value of the quantity attribute of each XML node that has "carrot" as the value of its type attribute. If you know XML I think you’ll see the power of that.

For completeness here is the XML and the whole code to read the XML string from stdin and do the XML query:

First the XML:


<sales vendor="John">
Β  <item price="4" quantity="6" type="peas"/>
Β  <item price="3" quantity="10" type="carrot"/>
Β  <item price="5" quantity="3" type="chips"/>
</sales>

and now the javascript:


importPackage(java.io)
importPackage(java.lang) 
stdinText=""
stdin=new BufferedReader(new InputStreamReader(System["in"])) 
while(stdin.ready()){
Β  stdinText+=stdin.readLine()
}
xml=new XML(stdinText)
print(xml.item.(@type=="carrot").@quantity)

In this version of the code two java packages are imported that are used to get the XML from stdin. I chose to do this so I could indirect from a file – and also from BPXWUNIX (the REXX interface to z/OS Unix I’ve mentioned several times recently here).

The "new XML()" phrasing is part of the javascript’s E4X capabilities – that allow it to process XML so neatly. (Another part is directly assigning literal XML (not a string) to a javascript variable. See ECMAScript for XML for a description of E4X.

One of the nice things about javascript is the proliferation of frameworks for it. My favourite is Dojo though I’ve also used jQuery. Although Dojo (and jQuery) have a lot to offer for web applications much of that isn’t applicable to z/OS Unix System Services. But some very useful things remain. For example some language extensions like "dojo.forEach()" iteration.

So I set out to try two things:

  1. Installing and using Rhino. You’ve seen an example of this above – processing XML.
  2. Installing and using the Base component of Dojo.

The latter proved interesting: I made a bit of a mess of it by copying the Dojo Base files up to z/OS one at a time. I should’ve done it wholesale. But I got there in the end. And here’s a test script:


load("dojo.js")
dojo.forEach([1,2,3],function(x) {print(x)})
print(dojo.isRhino)

which does the following:

  1. It pulls in the Dojo code.
  2. It creates a temporary array – [1,2,3] and iterates over it – printing 1 then 2 then 3.
  3. It prints "true" because Dojo is indeed running under Rhino.

So, it was quite easy to get the Rhino javascript interpreter up and running – under java – on z/OS. It would’ve been easy to install Dojo Base if I hadn’t made a mess of it. I think if you have people used to javascript they could be productive quite quickly – so long as you’re comfortable with a java-based environment. As previously mentioned, you could readily call javascript code this way from REXX with BPXWUNIX. Indeed you could have REXX pass the actual javascript code in.

On the performance side you need to be aware that Rhino converts the javascript code to java classes on the fly. It’ll probably perform well after the initial conversion – so stuff with lots of iteration. And I would expect it to offload to a zAAP (or zIIP via zAAP-on-zIIP) quite readily. (I’d love to see measurements if anyone tries it.)

As well as Rhino, Mozilla make a C-based javascript interpreter (SpiderMonkey) but I think that would be a much more difficult thing to get running on z/OS.Β 

And if you want an exemplar of how far games have come from the original Rhino try Uncharted 3. I’ve just completed in and it’s outstanding. (So outstanding I want to go back and play the whole thing again, as well as dipping my toe in the multiplayer water.)

| Corrected to read "BPXWUNIX" instead of "BPXWDYN".

Tivoli Workload Scheduler and Workload Manager Service Classes

(Originally posted 2011-11-04.)

I don’t think I’ve mentioned this before in this blog but Tivoli Workload Scheduler (TWS) has a nice "WLM Integration" feature. With it TWS can change the Service Class a job runs in – before submission.

The main purpose of this is to elevate Critical Path work.

We wrote about this in the Redbook mentioned in Touring The Upcoming Batch Optimization On z/OS Redbook, and so some of you will have seen me begin to discuss the area in the presentation.

This week Dean Harrison (a friend who is a renowned expert on TWS) and I presented together on "TWS and WLM". It’s all about this linkage. He talked about how you goad TWS into changing the WLM Service Class for a job. Very sensibly he didn’t say "a better Service Class" but rather "a different Service Class". I was watching for him to trip up but he didn’t. πŸ™‚

(He also did a nice job of handling the whole fraught question of what a Critical Path actually is. I think we all know the score here.)

But back to the "different Service Class" thing: We all know that it doesn’t matter what you call a Service Class, it’s what the goal is that counts. In my portion of the presentation I did the "now you’ve got a Service Class now what?" piece. So I talked, for instance, about whether response time goals were usable with Production Batch or whether you should stick to velocity goals. (On that one I think only if the batch looks like a large number of independent really heavy transactions should you use response time goals.)

So, my role was in the vein of the cliche "what you don’t know can hurt you" for Batch Operations folks. My message to them was "understand what the service classes you’re steering work towards actually mean".

But let’s stand it on its head: From the Performance Analyst’s perspective the cliche still applies:Β  How do you know what the batch is that shows up in PRDBATHI vs PRDBATMD? You don’t unless you take an interest in how the jobs got into either of those classes. So my message to "us" is "understand how work came to be in each Service Class and what expectations the Batch Operations folks have of it".

The "can hurt you" in the cliche is interesting: You could – whether a Batch Operations person or a Performance Analyst hide behind "separation of concerns". I wouldn’t recommend it though: Presumably installations take a dim view of such things. Another cliche on offer is "hang together or hang separately". I encourage both camps to work together. (Just as I encourage eg CICS and Performance people to.)

Best of all is if you can be skilled in bothΒ  TWS and WLM. A tall order, I know. But that might earn you a leading role in the (proposed in the Redbook) Batch Design Authority.

SYSIN In A Proc – New With z/OS Release 13 JES2

(Originally posted 2011-10-25.)

One of the nice enhancements in z/OS Release 13 JES2 was the support for SYSIN in a JCL procedure. (See here for the announcement letter.)

I have a personal example of where it would’ve been handy. You probably have your own.

We used to distribute sample JCL to use DB2 DSNTIAUL to unload the DB2 Catalog, one table at a time. Obviously with such a repetitive use in a single job step you’d want to use a JCL procedure. And so we did. But there’s a snag:

Each invocation of DSNTIAUL needed the same SYSTSIN DD data. The example below is for DB2 Version 8.


RUN PROGRAM(DSNTIAUL) PLAN(DSNTIB81) PARMS('SQL')
END

You can see that for Version 9 you’d need to change it to DSNTIB91 and anyway neither the program nor the plan name are fixed – as DSNTIAUL is a sample you compile etc yourself. So if you change it you either change it in the PROC or you don’t have one and make a global change.

Changing it in the PROC would be preferable, of course. But it couldn’t be done. So we added a step to copy these two lines to a temporary data set and refer to that from the PROC. Cumbersome but a well-known circumvention.

Enter Release 13. The need to do this has gone as you can now have SYSIN instream in a PROC.

Of course, if I were maintaining this JCL still I wouldn’t rush to publish a version that assumed Release 13. πŸ™‚ I hope in your shop you can use it sooner than I can.

A Small Step For RMF, A Giant Leap For Self-Documenting Systems

(Originally posted 2011-10-20.)

I mentioned APAR OA21140 before – here. It’s quite an old APAR and so it will (in all likelihood) be on your systems.

I’d like to draw your attention to a subtle 1-byte field in the SMF 74 Subtype 4 (Coupling Facility Activity) record: R744FLPN. It’s the partition number for the coupling facility. (If you’ve seen any one of several of my presentations I’ll’ve talked about this field.)

Here’s a problem I have but most of you don’t: I try to build a picture of your systems just from SMF / RMF data. (Actually I’ve run into lots of customers who have almost the same problem. From a "systems should be self-documenting" standpoint, if nothing else.) Let’s review what we have:

  • SMF 70 Subtype 1 gives you lots of information about a machine, including the serial number and all the LPARs – all in gory detail.
  • SMF 74 Subtype 4 gives you lots of Coupling Facility information – again in gory detail.

It’s been my contention that if you put these two together good stuff can happen:

  • You can get a true picture of the CPU performance of coupling facility, especially in the shared ICF engines case.
  • You can match the CF links to the CHPIDs from SMF Type 73 (not that the latter will give you much information).
  • You can identify what each of those LPARs that show up using resources in the ICF engine pool actually are.
  • In principle I can identify free CF memory that could be redeployed to other LPARs as needed.

Maybe the first of these is the most significant but, as you know, I like to show up in a customer having got quite close to their systems – not asking the questions I should’ve got the answers to from the data. So the next two are nice as well. I consider the fourth "unfinished business" as we don’t have a complete picture of machine memory but it’s still valid.

Today I finally exploited R744FLPN to do the matching. So you might like to, too. I could even define a view across SMF 70 and 74-4 with a full set of matching keys now. Whoo hoo! πŸ™‚

A long time ago I encountered a customer with multiple coupling facility LPARs whose names didn’t match the coupling facility names. So I wrote some code that only worked in the "1 coupling facility on a machine" case: Fairly trivial code.

Then RMF added machine serial number, the number of dedicated and shared engines and the LPAR weights to the 74-4 record. This helped with a customer case where every CF LPAR had the same name – across multiple machines. (Why do people do that?) πŸ™‚

But in asking them to do that I forgot one other thing: LPAR Number. So now OA21140 provides that and we can do a direct match (where the code doesn’t have to do lots of special-case detective work).

As I said, this APAR is quite old. Usually I like to take advantage of new data as soon as a customer sends me it. But I’ve been busy these past 2 years like never before. (You might’ve noticed that.) πŸ™‚ So, finally I have the change in my code in to take advantage of that. (And I still have the other code in place as a fallback.) It turned out to be a tiny change – that took me 15 minutes to code and about the same to test. You’d be amazed at what can get done in "interstitial time". πŸ™‚

All in a day’s work for a Principal Systems Investigator (PSI). πŸ™‚Β 

“Channel Z’s All Static All Day Forever” – IRD and Hiperdispatch

(Originally posted 2011-10-17.)

I’ve been looking for an excuse to reference the excellent B52’s Channel Z ever since we started talking about "z". And this is, I think, a good one. πŸ™‚

(Notice BTW the URL in the previous paragraph makes it clear SING365.com (should be "SING360" πŸ™‚ ) is using Domino.)

So, back to the matter at hand…

A customer I’m working with is using Intelligent Resource Director (IRD) Weight Management together with HiperDispatch. Note: I said "Weight" not "Logical Processor" because you can’t use IRD Logical Processor Management with HiperDispatch. The way this works is that IRD changes the weights – if it needs to – and HiperDispatch recomputes the vertical weights and (potentially) adjusts the number of Vertical High, Vertical Medium and Vertical Low processors accordingly. (Adjusting these counts is not the same as parking and unparking, which is an important point to bear in mind.)

My personal interest is as much about how to tell the story as it is about the actual performance. Well, not quite. So, in this case the story is that weights shift within an LPAR Cluster between one LPAR and another. This happens, in my analysis, in a credible and smooth way. There’s no "nervous kitten" here. So the HiperDispatch adjustments happen as well. But how do we show this?

Showing the shifting weights is easy: Stack up the weights for the LPARs by time of day. The total stack height remains constant – which we should expect. (IRD only shifts weights within the cluster.)Β 

The complicated thing to depict is the number of High, Medium and Low logical processors. The easy bit is to do this by time of day. But suppose, as I was, you’re producing an LPAR layout table? Which is a static depiction. Well, let’s rehearse what the instrumentation available to us is…

APAR OA21140 does a number of things. The relevant thing it does is to introduce SMF70POF in the SMF 70 Subtype 1 record. It’s in the Logical Processor Data Section, and you get one for each logical processor in each LPAR. The field tells you whether the logical engine is vertically polarised and whether it’s High, Medium or Low. (As the Low ones can get parked it helps explain the parking and unparking elements of HiperDispatch’s behaviour.) I take this to be the state at the end of the RMF interval and note there’s a bit which says if it changed during the interval.

This is easy to use in a table if you have a static situation. But, of course, IRD Weight Management makes it dynamic. So, when I summarise over a shift (of maybe 8 hours) some logical engines have changed between Medium and Low (and maybe some are only sometimes High). It all depends on the degree of shifting of weights. So some logical engines aren’t totally in one state throughout the 8-hour shift.

Here’s a way around the problem:

I define a fourth state in my reporting: "Vertical Transitioning" . So you might see an LPAR in my table as, for example, "VH:2 VM: 1 V?: 1 VL: 8". In this example 2 logical engines remain Vertical Highs all the time, 1 is always a Vertical Medium, 8 are always Vertical Low engines. Finally the "V?" means one logical engine transitions between states (possibly VM and VL).

"Vertical Transitioning" isn’t a technically recognised term but I think it captures the essence of the behaviour. Of course there’s wouldn’t be any Vertical Transitioning engines if IRD weren’t shifting the weights.

Which brings me back to the B52’s: "Channel Z’s All Static All Day Forever" is anything but true these days. πŸ™‚

Touring The Upcoming Batch Optimization On z/OS Redbook

(Originally posted 2011-10-14.)

As mentioned here, I recently participated in a residency in Poughkeepsie. Our task was to write the "Batch Optimization on z/OS" redbook that we hope will get out soon (though I think early next year is the most realistic timescale).

These past two weeks I "toured" the redbook in the Nordics: 5 cities in 2 weeks (Helsinki, Copenhagen, Oslo, Stockholm and Aarhus). I really wish I could’ve taken the whole "band" on tour – but that wasn’t to be. 😦

So here’s the abstract for the presentation:


Batch performance optimization remains a hot topic for many customers, whether merging workloads, supporting growth, removing cost or extending the online day. This presentation outlines a structured methodology for optimizing the batch window, incorporating techniques written about in a Redbook written by experts from around the world. This methodology is well-structured and draws on information every installation should have access to."


I’ve just uploaded the slides to Slideshare. If you read them the first section is a small amount about the residency. The rest of it pretty much follows the book’s structure. (I mention in the first section that other residents (in the other room) were working on batch containers. I don’t describe these.)

So, I hope you enjoy the slides, and I’m pleased so many people were able to be in my audience. I’d like to reach even more of you some time – as I think our messages are important. And I also hope I represented the viewpoints of my fellow residents adequately: Though we all walked in with very different ideas I think we managed to very constructively meld them together into a methodology we can all be proud of.