Mainframe, Performance, Topics

3G Card and Sufficient Rurality

(Originally posted 2007-02-24.)

I’m writing this in the back of a taxi – on my way to Heathrow. (A 3G card for my Thinkpad was waiting for me when I got home yesterday.)

My conclusion is that we live somewhere “sufficiently rural” as I’ve lost signal once. (That was between us and Henley.) Any more than that and I’d think we lived in the sticks. As it is to actually lose the signal at all is a reassurance that we don’t live anywhere urban(e). 🙂

And, no, I wouldn’t really mind if this “test drive” utterly failed: I only really need the card in places where the likelihood of a signal is strong anyway – such as airports and on trains.

Twitter – twittering on? Or real insight

(Originally posted 2007-02-23.)

Whichever it is, here’s my Twitter badge. Feel free to befriend. 🙂

http://twitter.com/flash/twitter_badge.swf
follow martinpacker at http://twitter.com

Wikipedia

(Originally posted 2007-02-18.)

Whatever one’s feelings about the definitiveness of Wikipedia a lot of people find themselves there, looking for information. Having twice anonymously edited articles – in one case to correct an error, in the other to add a minor snippet – I finally succumbed to the urge to get an ID. So my efforts aren’t quite so anonymous – and are therefore a bit more accountable.

I’m not going to go overboard with the literary style. And in any case with wikis the first writer is often not the one to beat it into shape. So you’ll see me contribute to pages on such subjects as z/OS and z9. (I just created the MIDAW page, for example.)

Come and join me! There’s much work to do on the mainframe-related pages.

DB2 9 and End Of Service For DB2 Version 7

(Originally posted 2007-02-08.)

Willie Favero is also blogging about this – and I stole the link from him…

Here’s the End of Service announcement for DB2 Version 7. It goes out of service on June 30, 2008.

I’ve just sat through four days of the T3 class for DB2 9 for z/OS – in Silicon Valley Lab. My “take home” message is that there’s lots of stuff in all the usual areas. It’s a truly exciting release.

Note: It’s “DB2 9” and not “DB2 Version 9” at this point.

So, with the End of Service announcement for Version 7, coupled with what I know of 9, Version 8 is a great place to be right now – in my opinion.

z/OS Release 9 is Previewed

(Originally posted 2007-02-08.)

See the z/OS Release 9 preview statement if you’re interested in z/OS.

There are lots of interesting Performance items previewed, as well as other line items.

It’s expected to become available in September 2007, following the current pattern of a new release each year (preview in the so-called Spring :-)and release in the Autumn) so expect lots more information on it later this year.

I do wonder how many customers will find today’s LPAR limit of 32 processors (CPs plus zIIPs plus zAAPs) to be a constraint. z/OS R.9 is expected to support 54 in total (CPs + zIIPs + zAAPs).

z/OS Release 8 Real Storage Changes

(Originally posted 2007-01-31.)

I suggested (in z/OS Release 8 Real Storage Manager SMF Record Changes) that I might talk about the changes to RSM in z/OS Release 8. As the release has been out now for about 4 months some of you might actually be on the brink of putting it into production. 🙂

Seriously, you probably are planning for the day when someone cuts over to Release 8.

RSM was largely rewritten – for a very good reason: The average LPAR has an increasing amount of memory. I typically see Production z/OS LPARs in the region of 10 to 20 GB, and I know they’re going to grow. (The largest machine I’ve seen, by the way, has 96GB on it.) As memory grows – usually faster than CPU – the cost of managing it increases:

The CPU time increases.
The time spent holding RSM and SRM locks increases.

So Release 8 sets about reducing both of those costs.

There’s another factor here: It is relatively rare to see an LPAR doing significant paging. Indeed most LPARs in my recent experience have gigabytes of unused memory. So, there’s less point micromanaging memory. Why agonise over which are the oldest pages in a system when you’re not going to have to throw any of them away?

The new algorithm looks very much like the old Expanded Storage algorithm… RSM keeps a cursor into the PFT. When the Available Frame Queue (AFQ) runs low, RSM needs to replenish it – to satisfy requests for memory (perhaps to back some new virtual storage page). Replenishment is undertaken by scanning the PFT, moving the cursor. Pages with the reference bit not set are deemed old and can be stolen (usually). Pages with the reference bit set have it reset, aging the page. So there is no UIC updating for pages. We sweep through memory looking for old pages (as opposed to new pages). Fixed frames (such as perhaps DB2 V8 buffer pool pages) do not participate in this.

This algorithm is much cheaper and allows LPARs to scale much better, memorywise.

RSM keeps track of how long it takes to scan the entire table. The longer it takes the less constrained memory is. And this is the new System UIC. Which is used to drive algorithms and eventually is surfaced as the Average System UIC (in RMF and SMF Type 71 records).

A question arises: Given that sometimes z/OS needs to know more than just the System UIC, how is this done? The answer is that memory is divided into (currently) 16 “Segments”. The timestamp at entry to the segment for the current sweep is compared to the timestamp for the previous sweep. This gives useful profiling information – and 16 data points.

I specifically asked the developers last summer “who stands to benefit the least from the rewrite?” One has to ask these questions – given that few algorithm changes are “all upside”. The answer was “memory-constrained systems”. So, if you have memory-constrained LPARs you might want to examine their storage allocations. If they’re Sysprog LPARs you mightn’t worry about it, of course. Depends on what you think of your sysprogs. 🙂

Going The Distance

(Originally posted 2007-01-26.)

I presented to the Large Systems GSE meeting in Dudley on Wednesday on “DB2 Data Sharing Performance For Beginners” and an interesting question came up…

“At what distance does performance begin to suffer when moving machines in a parallel sysplex apart?”

As it happens I am dealing with exactly that question in a customer right now. It seems it’s a popular thing to attempt. (I know: One user group question and one customer engagement do not a trend make.) 🙂

So I obviously need another foil or two in my presentation – as if it wasn’t long enough already. 🙂 Here are some early thoughts. (Actually you’re not the first set of guinea pigs – as I drafted something along these lines for my internal IBM blog.) 🙂

The “standard answer” is that performance begins to deteriorate above 10km. That’s fine – but I think I want a little more detail than that…

Disk Performance

The question of disk performance at a distance was raised in the GSE discussion. It’s my view that, although the speed of light does come into play here, it’s less of an issue by far than for coupling facility (CF) requests: An elongation of 30 microseconds is far more serious for a CF request than for a disk I/O. (And there are probably far more CF requests a second than there are disk I/Os.) Ignoring protocol differences – such as the number of exchanges per request – disk is probably OK for one or two more orders of magnitude of distance. But don’t ignore disk considerations.

Now let me talk about CF requests…

A synchronous request to a coupling facility accessed using Internal Coupling (IC) links can complete in, say, 30 microseconds. That’s because the link is rather more logical than physical. Sync requests to close-at-hand CFs using Integrated Cluster Bus (ICB) links are longer than that. Sync requests to CFs using ISC links are going to take longer still. ISC links allow much greater distances than ICB links (>> 7 metres). So, just to be using ISC links probably means a step down in request performance. And as the ISC link gets longer the speed of light comes into play. I also think it’s important to note that a CF request involves a number of signals up and down the link.

Coupling Facility Request Arithmetic

So here’s some easy (but probably inaccurate) maths:

Suppose each request requires 10 signals of some sort or other. And suppose the speed of light is 300,000 km/sec. Every kilometer of extra distance requires 10km of extra signal travel distance. Which takes about 35 microseconds (assuming signals propagate at the speed of light – which is the best case). So each request has an additional 35 microseconds of service time.

Further note that a synchronous request causes a z/OS processor engine to spin – so the time of a synchronous request also causes an equivalent CPU time “wastage”. z/OS, since Release 2, has used an adaptive algorithm, converting sync requests into async requests as appropriate. This helps minimise the coupling CPU cost of z/OS engines. And, as the distance increases, z/OS is more likely to convert your request to async.

Async requests don’t cause the z/OS processor to spin but they do take longer to complete. So longer distances may well have a knock on effect on request response time.

(You can measure the times and the sync vs async request rates – at the structure / z/OS level – using the RMF Coupling Facility Report data (SMF 74-4).)

So the basic conclusion is that distance does matter, and probably significantly below 10km – at the request response time level.

Applications

Now let’s think about what that means for applications…

If the request is from a z/OS image to a proximate CF structure the response time is going to be lower than if it were to a remote CF. And the difference is obviously going to increase with distance. But the impact on an application (such as DB2 (and its applications in turn) depends on the access rates and characteristics. For an application that always uses a local CF structure its performance will be better than one that uses a mixture of local and remote accesses, and still better compared to an “all remote” access pattern. And the fewer CF accesses per “transaction” the lower the impact.

But you might not be able to choose “all local” access patterns. And you might not have much choice about access intensity – but the latter is a major DB2 Data Sharing tuning theme. So don’t discount that possibility. In any (DB2) case you can use DB2 Accounting Trace to monitor and tune DB2 applications’ use of CF resources.

CF Structure Duplexing

Let me conclude by talking about structure duplexing…

First we need to review how CF Duplexing works. (I’ll use the DB2 Data Sharing example to illustrate it.)

System-Managed Duplexing

There are two copies of the structure – in separate coupling facilities, on separate machines.

For every request both structures have to perform the same amount of work, and the two CFs coordinate via a dedicated link. The request’s completion is signalled only when the request has been processed in both CFs.

One request is always to a remote CF. The other may or may not be. So in all cases a request is performed at effectively “remote speed”. Which, as I said, elongates with distance.

DB2 Locking (via the LOCK1 structure) and GRS Star are good examples of structures affected by this.

User-Managed Duplexing

There is only one exploiter: DB2 Group Buffer Pools (GBPs).

In the User-Managed case only the writes are processed by both copies of the structure… An async write to the secondary is followed by a sync write to the primary. When both have completed the request is signalled as having been completed. So writes always go at remote speed.

Reads are always from the primary. At an individual z/OS image level this could be remote or local depending on which machine the z/OS image is on and which machine the primary structure is on.

So the effects here are perhaps less severe. “Perhaps” because it all depends on access rates and patterns. But, for a read-only subsystem, local to all the GBPs all requests are going to be local. For another read-only subsystem accessing the GBPs remotely the CF response times will be higher. (So perhaps balancing the GBPs across the 2 CFs might help keep the application response times consistent.)

The bottom line with duplexing is that it stands to increase CF response times and sensitivity to distance. But equally you can tune DB2 usage, perhaps reducing GBP and locking traffic.

I’ve deliberately written this in a “design in public” style as I’m seeking early customer experiences and perhaps corrections to my thinking. I suppose that’s one of the things blogs are good for.

z/OS Release 8 CPU SMF Record Changes

(Originally posted 2006-12-13.)

I’ve just been looking at some fascinating SMF 70 records. They come from a System z9 EC S54 machine and the LPAR they’re from is running z/OS Release 8. I say “fascinating” because:

I’ve not seen an S54’s data before. (This is the largest z9 machine – with all 4 processor books fully populated with engines.)
There are changes that I asked RMF (and PR/SM) Development to add into Release 8 that I’m delighted to see in the data. (It’s nice when a developer says “sure thing, Martin, it’ll be in Release 8” but it’s even nicer when you get to see Production SMF data with the changes in.)

Some of the changes described below are not, in fairness, at my behest.

Engine Counts

The final section of the SMF 70 record used to simply be a lookup table for “pool number” versus “characterised processor type name”. So, for example zAAPs would be in Pool 3 – for z9. (For z990 they’re in Pool 2, alongside all the rest of the non-GCP processor types. But modern machines often have enough logical processors (and LPARs) to cause some of the Logical Processor sections in the record to overflow to second (and perhaps third) records. This means working out the configuration of a machine from SMF 70 requires you to process several records as one. Certainly our code (and probably that of other programs) doesn’t process them as one. So in Release 8 there’s a very useful enhancement…

Each of the “lookup table” sections now has a count of the number of each engine type. So, for example, the ICF lookup table section has the number of ICF processors on the physical machine.

Hardware Model

I asked for this a while back and I thought I had it in z/OS Release 7. However I’ve seen a number of sets of R.7 where this field isn’t in the record. But in this case the field is populated with “S54”. I assume this has little to do with z/OS Release 8 vs Release 7, but rather more to do with microcode levels on the processor. And the importance of this is that I can be sensitive to the number of books installed on the machine when recommending things like upgrades.

Machine Serial Number

This, which I asked for, has two purposes:

I can look up the Vital Product Data (VPD) for the machine in one of our internal databases. (Customer Engineers have access to a nice tool from Montpellier called VPDFWIN, which formats VPD into a useful display.) Amongst many other things it tells me the book count (see above), the number and size of memory cards on each book, and the I/O subsystem features on the machine. So, again, I would hope to do a better consulting job.
I can correlate the SMF 70 view of CPU with the SMF 74 Subtype 4 view of CPU. (SMF 74 Subtype 4 gives you a wealth of very useful information about Coupling Facilities.) So, for instance, I can tell which machine a CF is on. So I don’t have to ask you the stupid question. 🙂

Now, I only have SMF 89 and 70 for this machine so far, so I can’t also test if the machine serial number has shown up in the 74-4 record as well. I’m sure it has – as I asked for it at the same time as for it to be in the SMF 70 record.

I concede these are small “fit and finish” items but they do make the SMF 70 data just that little bit more useful and usable.

z/OS Release 8 Real Storage Manager SMF Record Changes

(Originally posted 2006-11-30.)

In z/OS Release 8 Real Storage Manager (RSM) implemented brand new algorithms. One of the more significant changes was the adoption of a “new style” UIC value. This behaves much more like the old Expanded Storage Migration Age. Accordingly the old UIC value (SMF71ACA) isn’t really appropriate to use anymore. Or at least it won’t behave the way it used to.

And before I go any further the SMF Type 71 record is the one that accompanies the RMF “Paging Activity” report. But most people process it into a database and graph its fields from there.

Now, the point of this blog entry is to alert readers who like to look at low-level SMF records that the Paging Data section in SMF 71 has been extended from 1040 bytes (ending at SMF71AFB) to 1120 bytes. The new fields can be summarised as:

Information on shared pages, typically in 8-byte floating point fields.
New numbers on UIC values, as 4-byte integers.

It’s the latter I want to focus on…

There are a whole host of these UIC fields, giving much more detail on how UIC varies through time. When the documentation refers to “Average current system UIC during the interval” (SMF71UAC), for example, it’s important to bear in mind that the interval consists of some number of samples. So the “Average” refers to averaging across the samples, and the “current” refers to the value at the end of one of the sample periods. It’s rather confusing terminology – but that’s the nature of the beast.

I expect, in a subsequent blog entry, to talk more about the new RSM algorithms themselves.

CICS and SecondLife

(Originally posted 2006-11-22.)

In Mainframe v-Business Boas Betzler threw out a challenge: “Demonstrate SecondLife talking to CICS”. I’ve picked this up on behalf of IBM. Now, why on earth would you want to do that, Martin?

To get beyond the “because it’s fun” answer (which is also true) let me tell you my view on SecondLife…

First, it is NOT a game. Emphatically not. Though it is a lot of fun. By way of analogy consider the graphics and sound cards that have crept into most PCs and are certainly built into most laptops. Those were originally for gamers. Now the analogy here is that things that look like fun now will probably become standard in a few years time…

Why shouldn’t we collaborate interactively in 3-D space in a way that’s far better than the current browser and application experiences can offer? Even with the collaborative aspects of things such as Web 2.0 and instant messengers like Sametime. Well maybe SecondLife isn’t there yet but it will be.

(And at this point I should reveal I’m “Timnar Mandelbrot” in SecondLife, in case you feel you want to befriend me.)

So, we have the “better UI” aspect. Maybe compelling, maybe not.

But we also have the “filthy lucre” element, and that could be more important:

I was talking to some friends in May in Washington about how people were making real money in SecondLife. The very next day I heard of an IBMer making real money selling photographs for people to use in SecondLife. That’s a simple example. A number of companies have set up shops in SecondLife: You pay for what they sell (whether for use “in game” or for delivery to your real doorstep) in Linden Dollars. These are directly exchangeable for US Dollars.

Which brings us almost to CICS. So, if a real company is trading in SecondLife, don’t you think they’d want to do it in a robust fashion? Well, this is where the ability to script objects in SecondLife. particularly linking to the outside world (via the llHttpRequest() script function), comes in handy.

Back to Boas’ challenge…

Wouldn’t it be nice to connect SecondLife transactions, requests for information, etc. to real business systems? With all the robustness and qualities of service that the mainframe, z/OS, CICS, DB2, Websphere, Websphere MQ etc bring. If real companies are trading for what is ultimately real money in SecondLife they need this stuff. So the link to CICS is actually quite important:

So, I’ve found a CICS system in Montpellier I can use. I’ve talked to an expert in the HTTP support in CICS – who is enthusiastic about this. I’ve talked to a number of people about SOAP (and it works well in CICS but rather less well in SecondLife). I’ve created a simple scriptable object on IBM’s private Hursley Island. And I think I’m all set.

So, I’m going to be putting all the pieces together over the coming weeks. Well enough to demonstrate that you can talk to CICS transactions from within SecondLife, using “Raw HTTP”. With that basic piece under my belt I’ll branch out and demonstrate some other z/OS-based products in the mix.

Yes, it should be fun. But it should also demonstrate that the mainframe has a role to play for organisations that are going to participate in SecondLife – whether to trade there or to provide a useful 3-D experience. Probably both in fact – if you recall how the web evolved from pretty browsing to business transactions.

In fact it went further than that with some of the Web 2.0 elements: providing interactive experiences and social networking. So next time (which may be the first time) someone invites you to a meeting or lecture or presentation in SecondLife, I’d take it if I were you.