z/OS R.9 RMF Parallel Sysplex New Fields

(Originally posted 2007-11-12.)

I’ve just re-read the XCF (74-2) and CF (74-4) sections of the z/OS Release 9 SMF Manual. There are some nice things in there…

XCF (74-2)

  • Number of buffer 1KB blocks by path (R742PUSE). This should tell you whether memory waste by over-specifying CLASSLEN is a problem. And some of the other scenarios.
  • Member job name (R742MJOB). I’d like to believe this would tell us which IRLM a given member is, for example.

Coupling Facility (74-4)

  • Whether this CF LPAR is using Dynamic Dispatch (R744FFLG bit 3)
  • How many shared processors (R744FPSN) and how many dedicated processors (R744FPDN) this CF LPAR has.
  • Structure Execution Time (R744SETM) which requires CFLEVEL 15. Useful for Coupling Facility capacity planning.
  • Whether this processor is dedicated (R744PTYP) and what its weight is if it’s shared (R744PWGT).

I haven’t actually looked at any sample R.9 data yet. I’ll have to put that to rights and get back to you when I see some of these numbers in action. But I do think they extend the data model rather nicely, particularly the CF CPU stuff.

Has anyone seen this data in action yet?

On Further Investigation…

I have found a set of 1.9 data. And here’s what I can immediately confirm…

R742MJOB is a very usable job name. Here’s an example…

From ERBSHOW…

#38:   +0000:  E2E8E2C2 40404040 C9E7C3D3 D6F0F0F4  *SYSB    IXCLO004
       +0010:  D4F8F040 40404040 40404040 40404040  *M80             
       +0020:  00030000 0000001A 00000016 00000000  *                
       +0030:  C7D9E240 40404040                    *GRS             

Whereas before we had an anonymous member (M80) and an anonymous lock structure (IXCL0004) we now know that M80 is SYSB’s GRS address space and therefore that IXCL0004 is GRS Star. Other Lock Structure exploiters similarly fall into place. I think this will be more interesting for eg IRLM.

Memories of Hiperbatch

(Originally posted 2007-11-11.)

It’s nice to see a flurry of activity in IBM-MAIN about Hiperbatch. And it’s more for the emotional reason of reminiscence than for any stunning insights that I’m blogging about it…

Back in 1988 I ran a technical project in my then customer, Lloyds Bank, to evaluate Data In Memory (DIM). It was a fun project with a wide range of workloads on multiple machines, including the then-new DB2. So we tried out all the analysis tools and even did a bit of Roll Your Own…

Through the IBM internal FORUMs I met another IBM Systems Engineer, John O’Connell, doing a similar thing for his customer, Pratt and Whitney in Connecticut. And he wrote some SAS code to evaluate VIO to Expanded Storage. We shared this code with Lloyds Bank. (I ran into him at several conferences. And later I discovered he’d left IBM and, I think, gone to work for a customer. Are you out there, John?)

And I wrote a presentation on VIO to Expanded Storage (called VIOTOES – which sounds funny if you pronounce it right). 🙂 There were two key elements in this presentation:

  • How to evaluate the opportunities for VIO. And what happened if you fiddled with the VIOMAXSIZE tuning knob (the maximum (primary plus 15 secondary extents) temporary data set size that would end up as VIO (and hence potentially in Expanded Storage)).
  • How to use the then new-fangled DFSMS ACS routines to control which data sets were even considered for VIO – and all sorts of other funky tricks with DFSMS.

This presentation went down well with IBMers and customers alike and could be considered my first conference presentation.

The point of the above is to set the scene for Hiperbatch…

So, we announce Hiperbatch as part of MVS/ESA 3.1.3. (Funny how we went 3.1.0, 3.1.0e but not 3.1.1 or 3.1.2.) And it had a hardware prerequisite of a 3090 S processor (because of the MVPG instruction – even though technically one COULD move pages between Central and Expanded Storage using ordinary instructions if we’d chosen to implement it that way.) The important thing is that we wanted an exclusive by tying this super duper new facility to a brand new processor.

And because of this my fourth line manager at the time decided we all had to run Hiperbatch Aid (HBAID) studies. At this point I learnt I was not a “team player”. Well duh. 🙂 I declared I wasn’t going to do it because we’d already crawled all over Lloyds Bank looking for genuine DIM benefit. And there wasn’t likely to be any from Hiperbatch. With that defence the requirement to run HBAID was waived.

You’d think from that I’d a downer on Hiperbatch and HBAID, wouldn’t you? 🙂 Far from it actually…

I did enjoy running HBAID in one or two other customers and I did get quite creative with Hiperbatch. And that was the trick, in my opinion – getting creative. And that realisation led on to other things…

One of the really nice things about HBAID was that it Gantt’ed out data set lives. And from that you could glimpse where other techniques might be useful (such as VIO and OUTFIL). So I invented[1] a technique called LOADS which stood for (for those of you averse to “flyovers”) 🙂 “Life Of A Data Set”. These signatures were really rather handy. Here’s a slightly later example…

A “standard” BatchPipes/MVS pipe candidate would be a sequential writer job followed by a sequential reader job for the same data set. There are, of course, lots of scenarios where Pipes is useful, each with their own signatures (LOADS).

And it was my good friend Ted Blank who told me about Pipes in late 1990. (And he had been involved in HBAID.)

Ted also encouraged me to start writing a book on Batch Performance. This later became SG24-2557 “Parallel Sysplex Batch Performance” and I believe you can still find it online. (Only this week I referred someone to it as a starter manual for what he wanted to do – but I DO regard it as being somewhat dated.) 😦

And the writing of the book got me into writing batch tools – which generalised what HBAID did and then some. And that’s how I came to be one of the developers of PMIO , the Batch Window analysis toolset / consulting offering. You may have heard of PMIO.

I’m aware of very few installations running Hiperbatch and even fewer running Pipes. 😦 But at least I got something out of it. And we did evolve the “state of the art” as far as batch window was concerned.

As for Hiperbatch’s applicability. I think it’s worth a look at. But there are so many other techniques around that more or less cover the same ground. Though some, like Pipes, cost money. And others are extremely creative to apply. But I don’t think it was ever going to take off in a big way. But that’s OK, given Hiperbatch was built to solve a problem one important customer.

[1]Actually I don’t claim to have invented it, just popularised and generalised it. After all HBAID itself was doing much the same thing. In fact someone once suggested I should patent LOADS but I declined on the “not exactly oriiginal work” basis.

WLM-Managed DB2 Stored Procedure Address Spaces

(Originally posted 2007-11-09.)

I was contacted by the team updating the SG24-7083 “DB2 Stored Procedures: Through The Call And Beyond” this past week. Their question was quite straightforward:

“One of the statements in the book, in the chapter on WLM address space management states:

To help analyze the use of resources by different types of stored procedures, you should name the server address spaces in such a way that it is clear which Application Environment they serve. With this naming convention SMF Type 30 Subtypes 2 and 3, Accounting Interval records can be used to determine the resource consumption by each server address space. These records include CPU time and virtual storage usage.

Do you have any details on what this might mean?”

As I wrote that chapter I guess I do. 🙂

Here’s the gist of my reply. I think it’s worth sharing with you:

First a little background to what I was talking about…

You can observe the starting and stopping of WLM SP server address spaces – using SMF Type 30. Job end/step end subtypes 4 and 5. And also you can count the address spaces with a given name using the subtypes 2 and 3. This gives warm fuzzies or cold spikeys 🙂 about how the WLM-starting-and-stopping business is working out.

And, further, you can see the “weight” of the address space – in terms of (non-DB2) I/O, memory and CPU from the Type 30 records. Recall the “weight” feeds into WLM decisions about whether it can afford to start another address space that services the same queue.

Here’s why I wrote what I wrote.. Given the address spaces each serve one and only one Application Environment and one and only one WLM Service Class it would be VERY nice to be able to say things like “This AE / SC had some reluctance to start another address space because the EXISTING ones servicing this queue are too darned heavy”. To do that you need to identify the queue related to the address space your looking at. Hence a good address space naming convention is the ONLY thing that will enable you to do it.

There is obviously more detail one could add to this…

The program name a WLM DB2 Stored Procedures Server Address Space runs under is “DSN9XWLM”. (Other types of WLM Server Address Space will have different names but the same basic “trick” will still work.)

One could also worry about Virtual Storage above and below the 16MB line in such an address space. It’s a fact that some kinds of DB2 Stored Procedure are a bit of a challenge in virtual storage terms. Recall: You can write a DB2 Stored Procedure or UDF using almost any programming language or tools you’d ever want to.

If you had only one AE and one SC for work using DB2 Stored Procedures you’d not care about which address spaces were which. But the whole flippin’ point of Stored Procedures is that they enable common application logic across DDF, CICS, Batch, Websphere, etc.

And finally, a tiny piece of background as to why I focused on this issue in 2003 – when the original Redbook was written…

A major customer whom I greatly admire had had some difficulties with getting nested Stored Procedures (and UDFs) to perform. This was largely down to WLM not wanting to start new server address spaces. I believe the problems are well and truly behind them. But it was a very interesting technical area to get into. And, as usual, I dug into the instrumentation to see if we could shed some light on the matter.

So that’s how I came to write the chapter in the Redbook – given the 1.5 weeks IBM Global Services so graciously allowed me to spend on the project – on WLM DB2 Server Address Space Management.

And now some diligent people are working on a new version of the Redbook – and I’m pleased they’re asking me “what on earth did you mean by…?” 🙂

GSE Conference 2007

(Originally posted 2007-11-05.)

Last week I attended GSE Conference for the first time in a long while. And I’m very glad I did.

Let’s get the egotistical bit out of the way first…

I very much enjoyed presenting Memory Matters in 2008. If you’ve seen me present it before you might notice some minor tweakings. We learn as we go, don’t we. 🙂 As usual the challenge is “too much material”.

I attended a number of other presentations. For me the highlights were:

  • Fabio Massimo Ottaviani’s presentation on zIIPs and zAAPs. This gave me some new ideas for charts – though I think I have to get the hang of zIIP/zAAP normalisation factors before I can implement them.
  • Mike Duffy gave a good presentation on Sysplex Distributor.
  • John Campbell was great value, demonstrating (in a painfully practical way) 🙂 what it takes to deliver High Availability. “Don’t trust a machine you can lift” comes to mind. 🙂 His proper topic was “DB2 Availability”.
  • Scott Drummond took Bob Rogers’ (or is it Harry Yudenfriend’s?) 🙂 foils on “More Data” and ran with them. Actually I think I might steal the foils off all of them and add my own touches – probably to add in some DB2 perspectives. This would be a 3-way “crossover” presentation: z/OS / z9, DB2 and DFSMS / DS8000.

And it was great to run into many customer and vendor friends. Thanks to BMC for the pretty t-shirt 🙂 and to BluePhoenix for the USB hub. And it was nice to spend time with most of my unit and my manager Gerry. Almost a team meeting.

Anyhow enjoy the foils and (perhaps) tell me what you think of them. They will evolve.

Hackday4 and Referer URLs

(Originally posted 2007-10-27.)

I’m going to have to stop adding “(sic)” after every use of the word “Referer”. As in “Referer URL”.

The term itself comes from one of the standard HTTP headers. And therefore is somewhat fixed. Despite giving my “wetware spell checker” a pink fit on a frequent basis. 🙂

Anyhow, in this blog entry I talked about how I can – as standard – get a display of the URLs people come from to land on my blog. (And in a comment to the entry I said I’d put up (using a standard Roller macro) the list of such Referer URLs).

Well, yesterday was IBM’s fourth Hackday (known as Hackday4). The idea came from Yahoo – who’ve been doing a similar thing for a long time. IBM has had 4, spread over the last 18 months. I’ve participated in ALL of them. I have – pretty much permanently – tons of hacking ideas swirling around inside my head. THIS was not the first I had for this particular Hackday.

So, I took my firefox extension – unfortunately only likely to be available internally – and taught it a new trick…

When a developerWorks blogger is at their “Referer URLs” page it takes the list of URLs and analyses the ones that came from Google. (I may add Yahoo if I get a significant number of hits). If the hit from Google was a search I take 2 things from the URL:

  • The country (as part of the domain name, with the assumption that “www.google.com” is the USA).
  • The search terms used.

I do some counting and display the country list. Likewise the search terms list. The search terms list is a bit trickier as you can, for example get searches with “zos”, “z/OS”, “z/os” etc. TODAY I don’t take the slash out and assume they’re all the same thing. But I do assume “z/OS” is the same as “z/os”. And I do make some attempt to recognise when there’s an acronym…

If I see “icf” and then later on “ICF” the search term becomes “ICF” and the original (mixed case) version is discarded.

People sometimes put quotes in searches. Today I don’t handle that.

It’s been a fun “time-limited” hack. It would be nice to do more with it. And perhaps to find some other Roller-based blogging site that I could test it with. Then maybe I can ship an EXTERNAL firefox extension to the users of that.

And what does this all buy?

Basically, given that over 90% of my hits are direct (which probably mainly are “spiders”) it’s NOT that statistically significant. But it does tell me something about where in the world my readership is located. And also something about the things people are searching for when they stumble across my blog. So maybe what to write more about.

It’ll be interesting to see if any of the “webby” terms in THIS entry show up in the list.

More on Coupling Facility Async / Sync Thresholds

(Originally posted 2007-10-17.)

Following on from Coupling Facility Async / Sync Thresholds – They Are A’Changin’ I’ve been informed by Development there is a new improved write up on how the Dynamic Sync/Async conversion works in Chapter 6 of the z/OS Release 9: Setting Up A Sysplex manual. I’ve read it and it is VERY good.

One thing to pull out is that there isn’t just one threshold… There are different thresholds for Lock vs List and Cache structures, for Duplex versus Simplex (aka Non-Duplex), and by machine. The thresholds that have increased are not the ones for the Duplex case. Those stayed the same. (The thing that got me sight of the new documentation was asking what effect Duplexing had on the thresholds, by the way.)

Not coincidentally I was in a customer yesterday planning how we are going to measure a test with DB2 Data Sharing at a distance of over 20km. Part of that will be experimenting with turning off System-Managed Duplexing but not (we think) User-Managed Duplexing (aka GBP Duplexing). Without in any way betraying confidences I’ll see if I can write up the lessons we’ll have learned over the coming weeks and months.

So do take a look at the new description. It is really very good.

Coupling Facility Async / Sync Thresholds – They Are A’Changin’

(Originally posted 2007-10-13.)

APAR OA21635 is one of a rare breed: A change to the thresholds for z/OS’s automatic CF Request conversion. (I’m told this has only happened once before.)

If you recall z/OS Release 2 (in 2001) introduced a very nice function that automatically converts Coupling Facility requests from Sync to Async, based on thresholds. The purpose of this is to minimise the CPU cost for a coupled z/OS system, while still providing reasonable request response times:

  • With a synchronous request the requesting z/OS CPU (engine) actively spins, waiting for the request to complete. So a longer request service time would lead to a higher CPU cost for the coupled z/OS system.
  • With an asynchronous request the requester does not spin. But the response time is longer.

When I say response time I mean CF request response time, which may have little to do with actual application response times. But the two aren’t totally divorced.

(Requests that were originally Async don’t get converted to Sync, by the way.)

The big change came, as I say, in z/OS Release 2 where XES introduced a new algorithm, measuring response times and, based on thresholds, deciding whether to convert Sync requests to Async. This measurement is not done for every request, but rather once in a while. So we don’t have “nervous kitten” syndrome here. 🙂 I like algorithms that are responsive. I don’t like algorithms that are overly jumpy.

So, back to the thresholds:

Technology changes, so it’s appropriate to revisit the thresholds from time to time. APAR OA21635 is a result of this. To quote Development:

The synch/asynch thresholds have been recalibrated to better reflect current z/OS and processor technology. After installing the APAR, installations may see some asynchronous CF activity shift back to synchronous operations resulting in slightly improved performance for applications sensitive to these CF operations.

What you’ll see when you install this PTF varies, depending on your situation:

  • For requests using an IC link – obviously to the same-footprint – nothing’s going to change as all requests are going to be Sync unless the request was explicitly Async. Requests using an ICB link (to a nearby footprint are probably broadly similar.
  • For long-distance requests again nothing’s likely to change as the new thresholds would again suggest Async conversion as the norm.
  • For middle-distance (say less than 2km) links (ISC) there may be some change: More requests are likely to go Sync rather than Async. For some customers this category has been of concern. The shift in thresholds is meant to address it.

An interesting game I like to play is to compare request types to “local” and “remote” CFs, particularly with Duplexing. And, obviously, the response times seen. The simple case is without duplexing where “local” requesters get Sync requests in the main – at perhaps less than 50ms. And “remote” requesters get essentially Async requests, with a much higher response time, especially at distance.

Distance discussions are well informed by such analysis. As are Duplexing discussions. Which is, perhaps, the essential “take home” message of this post.

When implementing the threshold changes I would form a view of such things before and after applying the PTF. And I’d be interested to know how well it worked for you.

And remember there’s nothing essentially good or bad about Sync or Async. It all depends on whether the behaviour is appropriate for your scenario.

But I’m pleased that, once in a while, the thresholds are revisited. So I think this is a good APAR.

The “Gas Gauge” was indeed more than a gimmick

(Originally posted 2007-10-12.)

Back in April this blog entry incorporated a picture of the z9 HMC display that includes the “gas gauge” (or “power meter” if you prefer).

This press release details a rather more serious purpose for the thing. And thanks to John McKown for pointing out the press release in IBM-MAIN Listserver.

New Support for BatchPipes/MVS

(Originally posted 2007-10-11.)

As many of you know I’ve been very fond of BatchPipes/MVS (aka “Pipes”) down the years (17 to be precise). So I’m pleased to seeAPAR PK34251: ADDING BATCHPIPES SUBSYS SUPPORT TO TEMPLATE UTILITY describes some new support in the DB2 Load Utility (as driven by the Template utility) which makes it much easier to use with BatchPipes/MVS.

(For reference here’s the BatchPipes For OS/390 Version 2 Release 1 announcement letter.)

I can see a number of scenarios where the ability to load from a pipe would be handy, potentially speeding up the load. (Whether it actually does will depend on all the other factors that govern the Load utility’s speed.) The most usual scenario is an unload step – perhaps using a utility or SQL, some transformation, and then a (re)load. This might or might not be into the same DB2 subsystem, but probably won’t be into the same DB2 table. The unload could be piped into the transformation step. If the transformation step doesn’t involve a sort then all three – the unload, the transformation and the load – could conceivably be done in parallel. (If it does involve a sort then the sort’s input phase would be overlapped with the unload and the sort’s output phase with the load, leaving just the (usually small) intermediate sort merge phase not overlapped.) For what it’s worth DFSORT has been able automatically to detect pipes for input and output – for 10 years now. 🙂 When it detects a pipe it switches to BSAM to process it, rather than using EXCP. (BSAM and QSAM are the only supported access methods for Pipes – as well as for Extended Format Sequential data sets – whether striped, compressed or not.)

The referenced APAR description does a good job, I think, of discussing considerations when using a pipe to load the data from.

And now I’m off to edit the BatchPipes Wikipedia entry. 🙂

DB2 Version 9 – STATIME Default Decreased to 5

(Originally posted 2007-10-10.)

I’m in a session where we’re going through DB2 Version 9 migration considerations – and right now there’s a table on display with changes to DSNZPARM defaults.

One of real value is that STATIME has changed to 5 minutes from 30. Unless you’ve overridden it you should now get much better information at the DB2 subsystem level. This does, of course, mean 12 sets of Statistics Trace records an hour, rather than 2. But it does mean that for the “counter” fields subtracting the first from the last gives you a MUCH better view of hourly rate at which the counter increments.

I think it’s a long overdue change. Of course, if you’ve hardcoded the default 30 value then you won’t see the improvement. And if you’ve already dropped STATIME to 5 or (still better) to 1 then Well Done! (In my DB2 Performance engagements I always ask for STATIME to be dropped to 5 or 1 – and no customer of mine has complained about what results.)