Is UCB Too Obscure For Wikipedia?

(Originally posted 2007-10-04.)

Well, is it? Apparently there are umpteen (count’em if you want a more precise number) πŸ™‚ expansions of the acronym “UCB”. But the one I care most about is Unit Control Block. (I think had been born before this meaning came into being – but I’m not sure.) πŸ™‚

Actually Unit Control Block was not one of the listed meanings. So I added it yesterday – but didn’t write much on it. You’re probably wondering at this point why I’d pick on UCB to write about. Well, it’s rather hard to write about PAV and HyperPAV if the reader doesn’t know about UCB’s, IOSQ time and UCB Queuing.

The more general point is that I think, as a mainframe community, we should contribute more to Wikipedia, whatever trust and accuracy issues we have with it. I raised this point on IBM-MAIN Listserver yesterday and I seem to be getting exclusively “thumbs ups”. So, do get writing and editing!

Late To The Party

(Originally posted 2007-10-03.)

I’m towards the end of revamping our analysis code to support (z9) and z/OS R.8. What took us so long? πŸ™‚ I’m telling you this for two reasons:

  • So you know what there is that might slow you down.
  • So you have some view as to whether we can competently process your data. πŸ™‚

System z9

About the only change in instrumentation between z990 and z9 is the separation of specialty engines into pools. But this is a big change…

We process SMF 70 records into tables – with rows and columns. When we had just 2 pools we could have separate columns for GCP and “ICF” pools. Now, with 5 pools (GCP, zAAP, IFL, ICF and zIIP) this approach no longer works. So we’ve reworked it to have separate rows for each pool for each LPAR and machine. This caused lots of breakage. But we’re over that. The other thing that did was to cause a rethink of how we display the pool-level data. And I’m much happier with how it turned out.

Incidentally, I’ve managed to make our (Bookmaster) reporting work nicely with B2H so I can now publish HTML versions of the textual reporting. I may well do that for the machine-level reporting in a future blog entry. I think I’d better learn some more CSS first, though, or you’ll be underwhelmed. πŸ™‚

Also on z9 our charts are by pool now. And I’ve taken the “IDLE” and “UNKNOWN” samples off the Service Class Period charts. That way the “Using” and “Delay” samples are clearer. (Also, of course the zAAP- and zIIP-related samples are displayed.) And I “smart out” zero delay buckets. Overall the foils are less cluttered and more punchy (but are still GDDM-originated CGMs – with styling that went out with the (B) Ark). πŸ™‚

z/OS Release 8

z/OS R.8 provides us with some challenges in the Memory area as previously noted. On a more positive note it added the machine serial number (e.g 51-11D68). This I can now display – and I do when I have it. But so what? Actually one immediate thing (and one deferred)…

  • There is a nice Customer Engineer’s tool called “VPD Formatter for Windows” (VPDFWIN). This takes the VPD that each machine periodically sends to the Boulder server and formats it. The input to the tool is the device type (e.g “2094”) and the 7-digit machine serial number. With it I get lots of gory information about the machine. Such as how much memory is physically on the machine and how much is purchased. Such things tell me the impact of e.g. buying more memory, or even deploying more to an LPAR.
  • A “still to do” is to tie up the 70-1 view of CPU for an ICF with the 74-4 view (as the 74-4 also got the machine serial number). But for now I’m concentrating on higher priority work – such as listed above.

So, I’ve been busy coding – and I like what’s coming out – despite the breakage both z9 and z/OS R.8 caused in my code. And as always having “early sight” of the data gives me a chance to advise my customers on what’s coming and how to use it.

Mor(e )on UIC

(Originally posted 2007-10-03.)

Thanks to the people who responded to this blog entry. And to the people who talked to me offline.

The result is a slight shift in emphasis: I never did talk about UIC as the primary metric of memory constraint. If you read the referenced blog item I mentioned it as one of three, alongside paging rate and free frames. The shift is that I’m going to be more strident about UIC in future, relegating it to “third place”…

  1. Obviously copious free frames would suggest no constraint.
  2. Obviously significant paging would tend to suggest some level of shortage – though customers have been reporting some paging even when there are tons of frames free (and I haven’t had a good explanation for that yet).
  3. If UIC is dropping then that ought to be some kind of indicator of constraint.

As always “constraint” has to be seen in the light of goal attainment for important service class periods. You can see “delay for paging” samples in Workload Activity Report.

Did I say I was making this up as I go along? I didn’t? πŸ™‚

Seriously, I’ve yet to see a day’s worth of z/OS R.8 data from a Production system. I expect to any week now. But rest assured, dear reader, if you get to send me such data I’ll be ready for it. πŸ™‚

Memory Metrics Now That z/OS Release 8 Is Upon Us

(Originally posted 2007-09-27.)

I was asked an interesting question today by a customer – one with dozens of LPARs and therefore not much time to study each one in excruciating detail…

“Given that System High UIC (hereafter referred to as ‘UIC’) behaves differently in z/OS R.8 how should I treat it?”

I’ve posted on this question in MXG-L Listserver, soliciting experiences and opinions.

With z/OS R.8 we replaced the “page-UIC” method of managing memory with one very much like the old expanded storage algorithm. Pages are divided into two categories:

  • Those without the Reference Bit set (regarded as “old” pages).
  • Those with the Reference Bit set (regarded as “new” pages).

When we need to acquire a new page frame to back a new page request we search through pages (using a cursor) to find old pages…

  • If the Reference Bit is set we reset it.
  • If the Reference Bit was not set we reuse the page frame. If the Changed Bit is set we page out the contents. If not we discard them.

I hope you’ll recognise this is very much like the old expanded storage algorithm.

The consequence of this algorithm is that the UIC is now the time taken to search through the whole of memory. If old pages are scarce we need to search through more pages until we find one – and hence the time to traverse all of memory is lower. So a high UIC means relatively little constraint. A low one means relatively more constraint. In that sense UIC behaves similarly to how it did before.

The difference is that UIC ticks on up in unconstrained times – so its absolute value is less useful for Performance Analysis. So what DO I recommend?

  • I recognise that the “state of the art” is evolving – so this is subject to revision / abandonment etc…
  • I still recommend understanding how many pages are free.
  • The UIC still has value in that a low value is still meaningful. Therefore I would want to keep track of the MINIMUM value during your focus window (perhaps prime shift) as that’s the worst memory constraint gets.
  • Paging rates are still important.

So track all three metrics and develop a view of how your own systems behave.

And do contribute to the folklore – perhaps by replying to my post on MXG-L. There haven’t been many visible R.8 customers with memory stories. So I’m hoping that means that going to R.8 was a non-event, memorywise. It has been a year since R.8 GA’ed so I’d expect to see some customers using it.

And, finally, I think this topic has got to be worth another couple of foils in my “Memory Matters in 20xy” presentation. I think it’s already getting to the point where I should split it into two…

  1. System Memory Performance Management
  2. Subsystem and Application Memory Performance Management

System z Technical Conference, San Antonio TX

(Originally posted 2007-09-26.)

Last week I was at the System z Tech Conference (again getting my zeds mixed up with my zees). πŸ™‚

I presented 4 times, one of which was a repeat. What’s especially nice is that I got 87 attendees for the “Memory Matters in 2008” presentation – spread across 2 sessions. This was despite the second time being the very last session of the conference… and 22 people still showed up to that! So thanks to everyone who listened to me, made comments and asked questions. I had a great time!

It was also nice to hear people (one high profile in this industry) mention that they followed this blog. But that gives me a greater sense of responsibility to contribute more to it.

Now that z/OS R.9 is right around the corner I feel an entry on SMF using System Logger coming on. And I’d better add that to the “SMF Management” wiki. I think THIS might be the way to go with some of our SMF management requirements.

The overwhelming theme of the conference was how much confusion there STILL is surrounding zIIPs and zAAPs. And I’m not sure I’m not adding to it myself. πŸ™‚ Certainly there were many sessions on specialty engines. And the shows of hands in sessions showed that both zIIPs and zAAPs are gaining momentum, with lots of customers using them. Add to that the recent statements of direction regarding XML and zIIPs and zAAPs and DB2 V9’s DDF SQL Stored Procedures support of zIIP. (And I can’t speak to how the Linux / z/VM crowd feel about IFLs – as I didn’t attend ANY of their sessions – but I get the impression IFLs are doing well.)

And the week before last I spent two days talking with hardware and software developers in Poughkeepsie. So I’m pretty confident there’ll be stuff to talk about in this blog for many years to come. πŸ™‚

Again, thanks to everyone who in one way or another contributed to my trip to Poughkeepsie and San Antonio being a success. Next stop Ehningen and Boeblingen. πŸ™‚

SCRT Version 14.1.0 is Announced

(Originally posted 2007-07-13.)

I previously mentioned the change in zNALC LPAR setup… You can (with APAR OA20314) now specify LICENCE=ZNALC. The SCRT co-requisite is Version 14.1.0.

By the way, from when you submit your SCRT report at the beginning of August you have to use this new version (until a new one is mandated) in order to be eligible for Sub-Capacity Pricing.

There are a number of other changes, some of which are bug fixes and some are additions to the product list. The details are here. And there’s a link there to the newly-updated User Guide.

z/OS Performance Instrumentation Management Techniques wiki

(Originally posted 2007-07-12.)

I’ve just created a wiki to discuss primarily SMF. Mainly from the management perspective, rather than the contents of each individual record.

This follows on from things I’ve mentioned in this blog before.

If you’d like to contribute to it (and it is DESPERATELY in need of contributions right now) get a developerWorks screenname and send it to me here. Then I’ll enable you to edit the wiki.

You don’t need a screenname to be able to view the wiki.

The wiki is here.

Why Mainframe Folk Should Care About Web 2.0

(Originally posted 2007-07-05.)

I presented a set of (someone else’s) foils on Web 2.0 to my team meeting last week. (Interestingly, being 6 months old they were already way out of date – what with Twitter and all.) Remember I’m in a mainframe crowd of effectively “gurus”. πŸ™‚ So why should they be interested in new-fangled webby stuff? So I got to thinking… Dear reader, why should you care about Web 2.0?

The minimal answer is “because it’s going to happen anyway, whether you like it or not, and you and your organisation are going to be left in the dust if you don’t embrace it”. I think that’s a fair answer but really there is positive stuff in there for us.

But rather than quote from The Long Tail or The Wisdom of Crowds (Read it and reading it, respectively) I think it better to point out some examples of Web 2.0 you may well already be using…

  • developerWorks blogs (like this one).
  • Wikipedia wiki (where those two book references above were from)
  • Flickr photo sharing site
  • Twitter microblogging

What these sites have in common is that they get better the more people use them. Both in adding content and also in rating and ranking material (or editing it in the case of wikis). As such they’re marking a shift away from static websites to ones where users have more control and the sites themselves just become enablers. And that leads onto changes in the web that we need to take notice of.

The other element of note is the idea of a “mashup“. This is where content from one site is mashed together with that of another to create (usually) a third site. Good examples of this would be the whole host of mashups built around Google Maps. Now they’ve been smart as they publish an API that web developers and mashup creators can use. The lesson here is that if you build your website so that it can be mashed up with others then your website will be used in such mashups and it will attract many more visitors.

A good analogy might be an insurance company that makes it hard for an insurance quoting website to garner quotes… That insurance company isn’t going to get so many quote requests as one that does.

Now, how does that affect the mainframe? It doesn’t directly but it does lead to a driving up of traffic and an ever higher reliance on good response times. So our old friends scalability and performance come into play. And we play well in those terms. And it does keep the focus on availability as well.

And how does it affect mainframers? My answer would be that we can really use a lot of these new technologies in our day jobs. And if we don’t we risk letting other platforms have all the fun. πŸ™‚

So I’d encourage people to dive into Web 2.0. And that’s what I told my team last week.

Feedback from my UKCMG Mainframe Performance Instrumentation Birds Of A Feather

(Originally posted 2007-07-05.)

It was a very good session, even if it was attended by just a “hard core” of mainframe sites. I think everyone said at least something and several said rather more than that. Here are some things I’d like to note from it…

There was a general feeling that it’d be useful to have a name for a machine that could be entered on the HMC and flow through to SMF records, particularly Type 70 (CPU). So for example a site might like to name it’s machines “North Mainframe” and “South Mainframe” rather than just being identifiable by the hardware serial numbers 5112345 and 8356789. I think this is a really good idea – at least from MY perspective as someone who wanders onsite and would rather use the names YOU use for your machi nes, even if I do remember the serial numbers for at least one customer. πŸ™‚ This idea, though, would require changes in at least three components. So I’m not overly optimistic. But I’ll make some enquiries and see what we can do.

We also think that the serial number should appear in OTHER SMF records (than 70-1 and 74-4) such as the other RMF ones and also Type 30. That would allow much easier matching up.

On the memory front I didn’t meet with that much interest – except that we think it important that the memory numbers become more accurate in Type 30 and Type 72. (Both of these are currently reported based on the notion of Service – which means that swappable workloads are under-reported in Type 72 and those that endure CPU queuing are under-reported in SMF 30.) We also would like to see SMF 70 report how big the machine’s HSA is and how much memory is purchased but not assigned to a partition.

We discussed instrumentation for VLF / LLA and for Catalog – which we think could be improved.

When talking about techniques for managing SMF data the SMFUTIL tool was mentioned. I’ll have to do some research into this, but I still think it worthwhile to post some examples of DFSORT being used to “slice and dice” SMF. One day. The meeting also felt that some kind of “best practices” guidance would be useful. Maybe I should start a wiki on the subject – so that the collective wisdom of the mainframe performance community can be tapped.

All in all I think the session was a success – and I’d like to do it again next year. I’ll work on these items but, as I originally said, there are no guarantees.

So thanks to the participants. Plenty of food for thought.

Abstracts for System z Technical Conference, San Antonio, September 17-21

(Originally posted 2007-06-23.)

Here are my abstracts for the conference:

Session B11: DB2 Data Sharing Performance for Beginners

This presentation provides an introductory-level view of how to look at the DB2 Data Sharing performance numbersfrom both a z/OS / RMF and a DB2h perspective.

Performance topics include: XCF, Coupling Facility, Data Sharing Structures, The application’s perspective, and Structure Duplexing.

Performance topics don’t include: Other forms of Data Sharing eg VSAM RLS, and overly detailed descriptions.

Session P22: Memory Matters in 2008

For z/OS LPARs memory management has changed radically over the years – from both the operating system perspective and that of applications. And the pendulum has swung back and forth between focusing on Real Memory and on Virtual Memory.

This presentation discusses managing both Real and Virtual Memory – from the perspectives of both the operating system and the exploiting products. The products include DB2, DFSORT, CICS, IMS, MQ and Websphere. One topic of particular importance to installations upgrading z/OS is the Release 8 Real Storage Manager rewrite.

Session P23: Much Ado About CPU

zSeries and System z9 processors have in recent years introduced a number of capabilities of real value to mainframe customers. These capabilities have, however, required changes in the way we think about CPU management.

This presentation describes these capabilities and how to evolve your CPU management to take them into account. It is based on the author’s experience of evolving his reporting to support these changes.