Mainframe, Performance, Topics

DFSORT JOINKEYS Instrumentation – A Practical Example

(Originally posted 2014-09-08.)

Some technologies show up “in the field” very soon after they’re announced and shipped. Others take a little longer.

Back in 2009[1] I blogged about one technology – DFSORT JOINKEYS. For this post to make much sense you’ll probably want to read that post first. Here it is: DFSORT Does JOIN.

Dave Betten and I have – at last – a set of data from a customer where one of the major jobs does indeed use JOINKEYS. The purpose of this post is to show you what one of these looks like – from the point of view of SMF records.[2] I won’t claim this post highlights all the statistics available to you but I hope it gives you a flavour.

Though the job is repeated this post will concentrate on one such running. As you’ll see from the graphic below it runs from 15:25 to 16:33. There are two steps:

A SORT invocation, running from 15:25 to 16:11.
A JOINKEYS invocation, running from 16:11 to 16:33.

SORT and JOIN Gantt

SORT Step

While the SORT step is the longer the purpose of this post isn’t to discuss how to speed up the job overall. But it’s a good “warm up”:

In this case we can see the Input phase (marked by the timestamps for OPEN and CLOSE of the SORTIN data set): 15:25 to 15:51.
We can equally see the Output phase: 15:51 to 16:11 (from the SORTOUT data set OPEN and CLOSE timestamps).
We can see 22 SORTWKnn data sets were OPENed and CLOSEd, spanning both input and output phases.[3]
We can see no Intermediate Merge phase – the Input and Output phases abutting each other.

From The SORT Step To The JOINKEYS Step

The SORTOUT data set from the SORT step feeds directly into the JOINKEYS step as the SORTJNF1 data set. Note it’s sorted twice – once in the SORT step and again in the JOINKEYS step – which seems rather a pity. It is read by a TSO user later, so maybe the two different sort orders are needed.

What I’ve just used is our Life Of A Data Set Technique (or LOADS for short). Below is the LOADS table for this SORTOUT data set.

SORTOUT LOADS

JOINKEYS Step

This is where – to me – it gets more interesting. In this case we’re joining two data sets – DDs SORTJNF1 and SORTJNF2.

As you just saw SORTJNF1 came from the previous SORT step.
SORTJNF2 is a relatively small data set.

both data sets are sorted on the same key fields. We know this just because they each have Sort Work File data sets – 5 used in one case and 21 in the other.[4]

You might’ve spotted that everything I’ve said so far is based on SMF 14 and 15 (Non-VSAM CLOSE for Read and Update) records. Now let’s start to dig into the SMF 16 (DFSORT Invocation) records, restricting ourselves to the JOINKEYS step.

We have three SMF 16 records for this step:

JNF1 Sort
JNF2 Sort
Joining Copy

The two sorts are necessary because the programmer told DFSORT to sort both files so the key fields for the Join are in order. As I indicated in DFSORT Does JOIN there are ways of avoiding this if the sorts are unnecessary (and terminating if the sorts are proven necessary).

For a real tuning exercise you’d try to avoid unnecessary sorts.

The following is a schematic of how the three invocations work. JOIN Flow

Let’s look at JNF2 first. The 5 Sort Work File data sets OPEN and CLOSE within the same minute (16:11) according to our Gantt chart. Indeed there are zero EXCPs to them. But the SORTJNF2 data set is held open until the end of the JOINKEYS step (16:33).

Note there’s no output data set from this sort.[5] We’ll come to what happens to the output data in a minute.

Turning to JNF1 the Sort Work File data sets stay open throughout the JOINKEYS step; There’s lots of I/O to them.

Again there’s no output data set from this sort.[5]

The third SMF 16 record relates to the Copy (with an exit) that does the actual join. It has no input data sets but it does have an output data set (DD OUTFILE1).[6]

So let’s turn to what SMF 16 tells us about records and how they flow:

JNF1 reads 179 million records from DD SORTJNF1 and passes them to a DFSORT E35 exit, writing none to disk. These records are fixed-length and each 300 bytes. The sort’s key length is 15 bytes.
JNF2 reads 5,000,006 records from DD SORTJNF2 and passes them to a DFSORT E35 exit, again writing none to disk. The sort is for 15 bytes again, which is curious as the LRECL appears to be 11 bytes; Some padding must occur – perhaps to match the keys from JNF1.
COPY inserts 179 million records, passing that many to OUTFIL.
OUTFIL reduces the 179 million records to 30 million; The SMF 16 record says OUTFIL INCLUDE/OMIT/SAVE and OUTFIL OUTREC was used, which begins to explain the reduction. But the LRECL remains 300 bytes; I suspect the JOIN is to decide which records to have OUTFIL throw away, before writing them to DD OUTFILE1, and the OUTREC is to remove the extra bytes from JNF1 used in the record selection.

One other point – from SMF 14 and 15 analysis: In this case I don’t see records for SYMNAMES or SYMNOUT DDs, so either DFSORT symbols aren’t being used or they are SYSIN or SPOOL data sets, respectively. To my mind SYMNAMES data sets are most valuable when they are permanent. I don’t expect SYMNOUT to have permanent value, beyond debugging.

Conclusion

There’s lots of extra detail in the SMF 14, 15, and 16 records of course. But I hope this has given you some idea of how to view the data when JOINKEYS is invoked.

And the reason it’s taken us a while to see JOINKEYS in a customer is quite straightforward: It’s not something you flip a switch to use; Rather you have to write code to use it.

And note that this post hasn’t given any real tuning advice: The previously-mentioned blog post does. And the actual customer situation is a little more complex than this (though the facts I’ve stated are all true).

I would think most customers have the function installed by now, so hopefully if you like JOINKEYS it’s there for you to use. ↩
To replicate this sort of thing you need SMF 14 and 15 for non-VSAM data sets, 62 and 64 for VSAM, 16 with SMF=FULL for DFSORT, and 30 subtypes 4 and 5 for step- and job-end analysis. ↩
In preparation for writing this post I took a detour: This Gantt chart used to, rather unhelpfully, have 22 lines for these SORTWKnn data sets, each with the same start and stop times. I now feel I can use this chart in a real customer situation as rolling up the SORTWKnn data sets that indeed have matching timestamps makes it so much punchier. ↩
Curiously JNF1WK16 is never OPENed. Perhaps I should teach my code to detect “missing” Sort Work File data sets like this. ↩
Both the absence of output data sets from SMF 15 and the absence of Output Data Set sections in the DFSORT SMF 16 record confirm this. ↩
You only get Output Data Set sections in SMF 16 if SMF=FULL is in effect for them. ↩

Workload Manager And DB2 Presentation Abstract

(Originally posted 2014-08-18.)

I’m pleased to be presenting three sessions at UK GSE Annual Conference, Tuesday 4th and Wednesday 5th November in Whittlebury Hall.

Two are on the zCMPA (Performance and Capacity or “UKCMG”) track:

Life and Times of an Address Space (Tuesday)
zIIP Capacity Planning (Wednesday)

I’ve written about these extensively. Obviously they’re evolved a bit and I have specific reasons to believe my experience will be further evolved between now and then.

But there’s a new one, on the DB2 track:

Workload Manager and DB2 (Tuesday)

I can’t be crisp about how this presentation came about 🙂 but I’m pleased to be doing it.

So here’s the abstract:

DB2 people don’t know WLM. WLM people don’t know DB2.

A slightly “cartoon” view but with an element of truth.

The point of this presentation is to unite the two perspectives, to give better DB2 performance while ensuring WLM is properly set up.

Over the years a recurrent theme has been enabling conversations between z/OS and DB2 people (and I admit to be more in the former camp than the latter).

By the way, I know it’s been a long time since I last posted. I might’ve totally lost my audience, but somehow I don’t think so. 🙂

I had a lovely holiday in Australia and then got very busy with a number of customer situations (which, personally, is the way I like to be). And, frankly, I had nothing to say. So I didn’t say it. 🙂 But now, while I’ve a heavy caseload, I’m seeing things that make me go “hmmm?” 🙂

I’m also pleased to say that my good friend Dave Betten joined the team I’m in as our Batch expert on 1st August. I’m hoping to coax some “guest posts” out of him, particularly in the area of DFSORT Performance. It’s great to have him onboard! I should also say I’m not giving up Batch and Dave is going to work on the full range of engagements I’m involved in. Two heads, I hope, will be better than one. For completeness, I’m also pleased to have Dave Hauser continue as our DB2 Performance lead.

Broker And SMF 30

(Originally posted 2014-06-03.)

Sitting in Dave Gorman’s Broker V9 presentation in Budapest it struck me it would be a useful exercise to apply the “Systems Investigation” techniques I write about to Broker running on z/OS. So let’s see how far we can get with SMF 30 Interval records, in the vein of Life And Times Of An Address Space. It’s a nice exercise [1] but I think it’s directly useful for looking at Broker itself.

By the way the name IBM Integration Bus is in use as of V9, but I’ll persist with “Broker” in this post.

I have two sets of customer data with Broker in, one active and one where Broker is up but not active.

What Is Broker?

Broker is a multiplatform product family that allows business information to flow between disparate applications across multiple hardware and software platforms. Rules can be applied to the data flowing through the message broker to route and transform the information. The product is an Enterprise Service Bus providing connectivity between applications and services in a Service Oriented Architecture.

The previous paragraph is mostly not my words. In my words I would say you get to connect disparate applications together using pipeline-like constructs called flows. These flows have nodes, akin to pipeline stages.

As well as running on other platforms Broker runs on z/OS. It writes Statistics and Accounting data in it’s own SMF 117 record (but this post isn’t about that).

Am I Broker?[2]

An address space is Broker if one of the following sets of conditions is met:

The program is BPXBATA8 and the Proc Step Name is one of “BROKER”, “EGENV” or “EGNOENV”.
One of the SMF 30 Usage Data Sections for the address space had a product name of WMB.

These conditions are corroborative but the second condition is possibly simpler to detect than the first.

All the address spaces for a Broker instance have the same job name (but, obviously, different job IDs).

The structure of a Broker instance is as shown here:

Broker Address Spaces

Broker Instance BRK1 has a Control Address Space and three others.

Execution Groups

Broker flows run inside Execution Groups, each of which is an address space. The step name is different for each Execution Group, being the last 8 characters of the Execution Group Name.[3]

In the two sets of data one has a handful of Execution Groups, each with a mnemonic name.[4] The other has no Execution Groups, so no flows can be deployed to this one.

In the example diagram above, there are three Execution Groups: Tom, Dick and Harry. Each has its own flows.

There’s one further piece of information we can glean about execution groups:

If the Proc Step is EGENV the Execution Group has its own specific profile. If it’s EGNOENV it doesn’t. In the case of the customer with Execution Groups they are all EGNOENV.

Which Broker?

The Broker instance is given by the job name which, as I said, is the same for the Control Address Space and all the Execution Groups.

Which Version?

You can use the Usage Data Section to establish the Broker version, except I’m seeing “NOTUSAGE” in both the sets of data I’ve seen – which doesn’t help distinguish Version 7 from 8 from 9. But I’ve only got two sets of data…

CPU

Drilling down into individual address spaces / Execution Groups pays dividends when it comes to CPU:

For the customer with a handful of Execution Groups only two use significant amounts of CPU. The total is about 3.5 engines’ worth and one Execution Group uses 60% of that and the other uses 40%.

There was a tiny amount of zIIP CPU usage in the active case, also. As you can write nodes in java that’s not surprising. You can also access DB2 in a flow but whether it’s the right kind for DRDA zIIP Eligibility I don’t know.

Memory Usage

There’s good news here:

Because Broker is 64-Bit the vast majority of virtual storage is allocated above the bar and memory (and Aux / Flash) usage numbers are accurate. For 24-Bit and 31-Bit Virtual Storage I can only see Allocated but, as there’s not much of it, I can live with treating that as Used Real without too much overstatement.

The LE Heap is the main user of memory, and thank goodness it’s 64-Bit: I’m seeing values from a few hundred MB to several GB. In the customer with no Execution Groups much of this is paged out to Aux. I can tell this because, as I said, 64-Bit Virtual is reported in SMF 30 as either backed by real memory or Aux / Flash.

Who Do I Talk To?

As those of you who’ve seen me present Life And Times Of An Address Space know I see two main ways of figuring out who an address space talks to, without going deeper than SMF 30:

Usage information in SMF 30.
XCF Member information in SMF 74–2.[5]

In the data I’ve seen Broker doesn’t directly use XCF signalling (and I think that’s generally true of Broker) so I don’t expect 74–2 data to show anything.

I do see Usage information for other products associated with the address spaces:

In one case I see DB2, Websphere MQ and Websphere Transformation Extender (WTX).
In the other case I just see DB2 and Websphere MQ.

In both cases I see the DB2 and MQ versions and subsystem names. In the WTX case I again see “NOTUSAGE” but this is a single data point.

Workload Manager

I see, of course, WLM Workload, Service Class and Report Class in SMF 30. One of the features of Broker is you can classify each Execution Group (address space) separately to WLM. I’ve not seen it done but I’m certain that would be reflected in SMF 30.

I/O and Database

In the case of the customer with active flows I see quite a high EXCP rate (290 per second), with one Execution Group performing about 90% of this. I also see a small amount of Unix File System I/O, this time mainly in a different Execution Group.

I would expect I/O to vary depending on the nature of the flows.

I was not in a position to look at DB2 but some flows process SQL so I would expect DB2 Accounting Trace to be of some use here.

Handling Multiple Address Spaces With The Same Name

As I said, a complete Broker Instance comprises a set of address spaces, each with the same name. My code generally summarises all the address spaces with the same name into one row per reporting interval (or higher). That’s what the SLR Summary Table does.

In this case that level of summarisation is unhelpful. So I retained the Log Table, which does keep separate JobIds separately and wrote reporting to go against this log table – but only if the sum of In and Out address spaces with a given name is more than 1.

It more or less doubles the size of my performance database doing it this way. But for cases like this it’s worth it.

There are some other cases where this approach might well yield dividends. A good example might be DB2 Workload-Manager Stored Procedure address spaces (which I tend to term Server Address Spaces). Potentially these can be legion, with the same job name.

Conclusion

I think you can do quite a bit with detecting and analysing Broker. To really go to town on it you do need SMF 117 (or the Distributed equivalent), of course. And I don’t yet know what DB2 Accounting Trace (SMF 101) would reveal.

I haven’t, in this post, written about a time-driven view using SMF 30. After all these are Interval records. I’m about to teach my code to pump out some graphs that will help me do that. Stay tuned.

It’s been an interesting exercise which has stretched my code[6]. I’m, in parallel, applying the code and techniques to CICS, CTG, DB2, MQ, and IMS [7] groups of address spaces. I might write about some of those too. Again, stay tuned.

Which should make it interesting and useful for z/OS customers who don’t have Broker on z/OS. ↩
As opposed to Broken. 🙂 ↩
The Control Address Space name, in SDSF, is the same as the job name, being the Broker name. In SMF 30 Interval records, however, it just says “STARTING”. ↩
Actually they have two LPARs, each with a Broker instance on. The Execution Groups are the same in each, except one Execution Group where the two instances have slightly different spellings on the name. I’m not sure if this is deliberate or a mistake. ↩
If you process SMF 30 you probably process 74–2 so I don’t count that as deeper. ↩
Which is, in my book, always a good thing. 🙂 ↩
I generated test cases around specific IBM products. I might well add to this list. And I just applied the code to a group of jobs beginning “NTA” which explained a CPU spike early one morning on a customer system. (Even though I could have step- and job-end SMF 30 (Subtypes 4 and 5) Interval records helped with this spike rather better.) ↩

System z Technical University, Budapest 12-16 May 2014, Slides

(Originally posted 2014-05-21.)

In Budapest at the European System z Technical University I presented three topics:

The links take you to the Slideshare uploads of these presentations. The first two of these are updated for this conference and I’ve overwritten the previous versions – as the new versions subtracting nothing.

I think this was a really good conference, with lots of interesting discussions, some catching up with friends, and acquiring some new ones.

Comments and questions welcome, as always.

And Just Complain

(Originally posted 2014-05-18.)

“Mobile” appears to be “flavour of the month” right now, and this week at System z Technical University it has certainly been a topic in evidence, whether it’s discussions in the breaks, sessions on software pricing, or sessions on Mobile-enabling technology.

I don’t intend in this post to discuss any of these.

Instead I want to talk about the types of users Mobile brings, and the impact on such things as capacity planning. But, for once, I don’t want to talk at length about either topic.

User Characteristics

The title of this post [1] nods in the direction of the kind of users mobile brings.

Compare Mobile users with traditional interactive users. I’m thinking in particular of CICS and TSO users. These traditional users have at least some understanding of computers, though I might be overstating this.

Mobile users, though, have no real understanding of how the service is provided and don’t really care (and nor should they.) So I think they can be characterised as much less patient and much less tolerant of service issues, and that’s fine.

Capacity Planning

In recent months most customer interactions have included at least some discussion about the onslaught of Mobile, even if the discussion didn’t start out that way: Customers are volunteering it, unprompted. In a word they’re worried.

A colleague pointed out that it isn’t really possible to do Capacity Planning for Mobile:

You can measure load and attempt to assess the footprint of a user – up to a point.
You can’t predict the demand.

So there are two things to do:

Understand what might limit scaling, whether it be some resource such as CPU or CICS Virtual Storage, or something logical like locking. Then you build a plan to overcome those potential bottlenecks. Fortunately we have nice CPU, memory, disk etc scale up capabilities – but not for free. And we have good facilities to deal with many kinds of logical constraints, too.
Try to get some interlock between the business units doing Mobile and the IT people who have to handle the workload. One example that came up a couple of times this week is of a bank’s customers who drive many more transactions for no more bank revenue: The customer still expects to get good service, or they’ll go elsewhere.[2] So the organisation needs to understand the cost implications.

Is This Just Mobile?

Actually I don’t think it is just Mobile, and that might be reassuring to know.[3]

Web users in general are in many ways similar, with the same impatience, unpredictability of load and incomprehension characteristics.

But, not counting Mobile users, the scale has been smaller with Web. I say “not counting” because many Mobile users are web users, using the same http(s) protocol.

Actually this begs the question “what is Mobile?” Some of the discussion this week has been around that very topic. Which leads to a plea…

A Plea

As a Systems or Performance / Capacity specialist try to understand your installation’s Mobile architecture. And try to spot the roll out and ramp up.[4]

An informal sampling of customers this week suggests that could be quite hard to do. But it will, I think, make life easier in the long run.

And finally a thank you to my friend Theresa Tai for the pun word “mobilise”. She used it in her presentation on Monday to mean “make ready for Mobile”, but I like the other meaning: So let’s mobilise for Mobile. 🙂

Fairly obviously a gratuitous Queen reference: To Radio Ga Ga. 🙂 ↩
With customers like that maybe you want them to. 🙂 ↩
Maybe only because we’ve seen it before. ↩
Part of this is about recognising componentry appearing and evolving. Part of it, though, is about defining metrics and actually using these to measure. ↩

Hints Of Other Systems

(Originally posted 2014-05-17.)

You can blame the weather for this post. 🙂 I’m writing it on a flight above thick cloud[1] on my way to Munich and then to Budapest for this year’s European System z Technical University.

I like to see the complete picture when I’m examining systems: It makes getting it right so much easier. And there’s something rather satisfying about getting your arms all the way round something.

But I don’t always get “complete” data from a customer. So I work with what I can get and this post is about what I can infer about others systems whose data I don’t have.

When I talk of “not getting data from all systems” I should perhaps clarify: Most installations run RMF on most of their systems and the SMFID in the header of SMF records is the system RMF ran on. I do get information at some level about other systems from RMF SMF records, but its far from complete.

Partial Data

There are a number of good reasons why customers don’t send me data for all systems, including:

It can be a lot of data.
Coordinating across multiple systems can be difficult.
One system, or maybe two, shows the behaviour of all eight.
Only a subset of the systems are of interest.

The last of these is the most common, particularly with installations jamming all their Production systems into one Sysplex.[2]

For some situations I really do need to see all systems. A few examples that come to mind are:

When designing a software cost minimisation scheme I want to see all the systems’ use of CPU.
When understanding the dynamics of a coupling facility structure I want to (at very least) see the request rates from all systems using the structure.
I recently had a Group Capacity situation where I only had SMF 70–1 data from 1 of the 2 systems in the group: I couldn’t explain why it was hitting the cap.[3]

But generally I can tolerate seeing data from a subset, so I’m not insistent when I don’t need to be.

The question of the day is “how much can I glean about systems whose data isn’t present?” Because maybe I can get a good understanding of an installation anyway. So let’s see what we can do.

Spotting Other LPARs

You can see all the LPARs on a physical machine from SMF 70 Subtype 1 Logical Partition Data Section[4]. You get further detail on logical engines, memory allocated and CPU Utilisation in the 70–1 Logical Processor Data Section for these LPARs.[4]

Among other things the names and definitions of these LPARs can be fascinating.

You also get a small amount of data for deactivated LPARs, most particularly the name and Partition Number.[5] It’s relevant to know for example that one machine has an activated SYSB and another has a deactivated one.[6]

Spotting Other Systems

I can sometimes see the existence of other systems, not on the same footprint, Here are a couple of examples of how:

SMF 74–4 (Coupling Facility Activity) has a list of all the systems in the Parallel Sysplex[7]. But I don’t see from this data which footprint they are on, or anything else about them.

SMF 74–2 (XCF Activity) has information about XCF members (and their corresponding job name). So if this system uses XCF to communicate with members in other LPARs you see those other members and those other systems.[0]

A nice example of this is DB2 Data Sharing where – through the three XCF groups involved – you see all the IRLMs. In one case I saw four IRLMs on four systems, despite only having RMF SMF from one of them.

Another nice example is CICS regions that talk to ones on this system via XCF.

Spotting Coupling Facilities

RMF SMF 74–4 records are cut for all coupling facilities in the Parallel Sysplex, regardless of which footprint they are on.

This data nowadays includes the machine serial number and LPAR Number.

Sometimes I infer the existence of a whole machine – where none of the systems on it provided RMF data – from the existence of a coupling facility on it.

And What Of It?

Maybe not much to you if you work in a customer.[8] But to me this fills in handy gaps. And it’s nice to spot probably unintended clues.

(Completed on a bumpy ride from Munich to Budapest.) 🙂

Rest assured that if there were no cloud below I’d be enjoying the view instead of writing. 🙂 ↩
If you don’t know why then ask a grownup. 🙂 ↩
While from 70–1 I know when an LPAR is affected by the group cap for LPARs that I don’t have data for I don’t get each LPAR’s Rolling 4 Hour Average CPU Utilisation – if I don’t have the SMF records their RMF cut. ↩
Which shows up in the Partition Data postprocessor report. ↩
see LPARs – What’s In A Name? ↩
As you probably guessed, it’s likely to be a recovery LPAR in case, for example, the first machine dies. ↩
These are 8-character XCF System Names rather than 4-character SMFIDs but usually they are the same (or at least relatable). ↩
Actually I’m increasingly of the opinion this isn’t true: It’s probable as a customer you don’t know as much as you’d like to about what goes on in your installation. ↩

Appening 4 – SwiftKey on iOS

(Originally posted 2014-05-03.)

Sometimes I’m in the mood to carefully peck at the text and sometimes I’m in the mood to just “splurge write”. And sometimes a bit of both.

This post is a case in point: I just want to get the words out as fast as I can.

Now, I do quite a bit of writing on iOS as it lets me write wherever and whenever I get the chance. I like its prediction and correction capabilities. But the app I want to talk about in this post takes that a good deal further.

It’s SwiftKey – available on iPhone and iPad alike.

You type and it presents three alternative words to choose from, as shown below.

Of course I chose the middle one.

It often predicts the words before you complete typing a word and sometimes you don’t even have to tap a word for it to be chosen.[1]

In my experience the accuracy of prediction is high, especially if you let it read your Evernote account to glean your writing style. It also learns from what you type in the app: So, in the example in the screenshot it has learnt that the word “choose” is often followed by “from”.[2]

I find whether I use a Bluetooth or an on-screen keyboard I can type much faster – which is a good thing as my brain often overruns my ability to type.

SwiftKey doesn’t understand (Multi)Markdown so it’s not much use for formatting. But recall one of the strengths of MultiMarkdown is the lack of formatting commands when writing paragraphs. Markup can often wait.

Unfortunately you can’t use SwiftKey as the standard text entry subsystem for apps in general. So I find myself cutting and pasting the text into other apps, such as Editorial. This is only a minor pain in fact: I’m still getting ideas down very fast. But it would be nice if Apple allowed customer data input mechanisms.

In fact you can use SwiftKey as a fast test entry mechanism for Evernote as it can save notes directly into Evernote. In fact I don’t do that: Most of my Evernote notes come from elsewhere (and have a good deal more structure.) Perhaps I’ll write about that one day.

So this is the fourth in a series about apps I use. The previous one was Appening 3 – Editorial on iOS. My reviews aren’t as comprehensive as many you’ll find on the web but they are more insights into how I use stuff than formal reviews. Of course this isn’t an official IBM endorsement of SwiftKey: I’m just telling you about a tool I use and what I think about it.

This is the case if the middle choice (in white) is the one you want. ↩
SwiftKey doesn’t transfer its learning from one machine to another (for example via Dropbox) but I haven’t noticed this to be a problem – even though I use multiple iOS devices for authoring. ↩

Once Upon A Restart

(Originally posted 2014-05-02.)

If you have a large mainframe estate it can be difficult to keep track of when the various moving parts start and stop. For example, if you’re a Performance person it’s quite likely nobody bothered to tell you when the systems were IPL’ed. You might well know what the regime for starting and stopping CICS is but I wouldn’t.

As you know I’m curious as to how customers run their installations and starting (and stopping) pieces of infrastructure interests me. I’m also impressed when a piece of infrastructure has been up for years – as sometimes happens. Up until now it’s been a matter of folklore such as “the installation that didn’t take an application down for 10 years”.[1]

But I’ve turned my attention to when z/OS is IPL’ed and when key address spaces start and stop. I’m sharing the technique in case it’s something you want to do.

I’m also interested in the sequence and timing between a z/OS system’s IPL and when important subsystems are up.[2]

I’m not going to pretend to be an expert in how systems are restarted or recovered but I am going to take an interest. Knowing what’s “normal” is, I think, useful.

Simple Instrumentation

You probably know that SMF 30 subtypes 4 and 5 describe steps and jobs, respectively. You probably also know SMF 30 subtypes 2 and 3 are interval records.

If you’re already collecting these you’re in good shape as Reader Start Time is in all of these. It’s all you need to figure out when stuff starts.[3]

I prefer the interval records as

Most customers send me SMF 30 interval records. (I get the others for batch studies.)
You can get the Reader Start Time from these even when the address space is still up. (When the Reader Start Time changes for an address space I know it’s restarted.)

Summarisation And Reporting[4]

For some address space types I report each job name separately. CICS regions are a good example of this. For others I pick the first one for a subsystem. DB2 and MQ subsystems are a good example of this.

To detect an IPL I choose the address space whose program is IEEMB860. In principle the job name could vary. And yes I know that “pressing the button” on IPL invokes NIP etc before this (the Master Scheduler) address space starts up.

I only print date, hour and minute for Reader Start Time. It goes to hundredths of seconds but I’m not interested in that level of detail.[5]

In my report I sequence by timestamp. That makes it easier to see when an IPL is followed by, say, a DB2 start and then some CICS regions. I could probably create a useful Gantt chart from this, but today I don’t. The technology’s there to make this easy to do.

Conclusion

Looking at this data gives me a much better idea how installations manage the lifecycles of their address spaces. If I talk to you about this topic it’ll probably be from this data and I might well refer you to this blog post. This is also one of the topics in the 2014 revision of my “Life And Times Of An Address Space” presentation.

Two final points:

Reader Start Time doesn’t denote the time that a subsystem became available, so it’s not that good for application availability. You probably want to use the subsystem’s own instrumentation, such as logs, for that.[6]
One of the merits of the Reader Start Time technique is that it’s very “light touch”.

I made that one up but it’s not unrepresentative. ↩
I guess the readers of the System z Mean Time to Recovery Best Practices Redbook would be interested also. ↩
Other start timestamps are available but this one does just fine. ↩
I expect to evolve my reporting. I usually do. ↩
People analysing IPLs probably are, or at least down to the second. And they’re probably interested in the differences between the various start timestamps. I could take an interest in the precise sequence in which “low Jobid” address spaces start up. Likewise the sequence in which e.g. clusters of CICS regions or DB2 address spaces start up. The data’s all there. ↩
Or use what I call the “roaring silence” technique. A good example would be when the SMF 101 (DB2 Accounting Trace) record cutting rate drops to zero for a few minutes. That might denote a restart, with the subsystem being “back in business” once records start to be cut again. ↩

Appening 3 – Editorial on iOS

(Originally posted 2014-05-01.)

Over a year ago I wrote about a couple of iOS applications I was enjoying using.[1]

And now it’s time to write about a third, as it’s part of my authoring toolkit. In Recent Conference Presentations I showed off my then new writing rig: Byword (with MultiMarkDown) and my iPad Mini with a Logitech light keyboard cover. I said that would be my rig for a while.

So this post also serves to talk about my current writing setup. The hardware has changed a fair amount: I still use the iPad Mini in lots of places but now I’m using my iPhone for writing in tight spots and a new iPad Air for where I can spread out a bit more.

The software has changed as well: I use Byword on the Mac for final editing and use it less on iOS devices. That’s because I have a new writing tool on the iPads: Editorial, which also does MultiMarkDown. The “secret sauce`” in Editorial is the ability to write what are called workflows. Some of these are drag and drop essentially pipeline stages. But, and this is where my interest really takes off, you can write workflows in Python.[2]

Those of you who follow me on Twitter might’ve spotted me talking about building workflows with Editorial. The two most notable ones – both built in Python are:

A workflow to check footnotes are properly defined and actually get referenced. (It actually sorts the footnotes by first reference – which is useful for authoring but redundant when it comes to converting the MultiMarkDown to HTML.)
A workflow to check I’ve created the images I reference – or at least that they exist on the iPad.

But here’s a simple workflow. It does a (probably useless) thing of inserting the name of the iOS device into your document’s text at the cursor. The workflow has only one “stage”, a Python script[3]:

This looks like “keyhole surgery” but clicking on the full editor yields

and then you can edit, with some very nice syntax assistance and module prompting, to your heart’s content.

When you run this the words “Written on FunfPad[4].” are inserted into the text. Actually, if you have selected range of text it will all be overwritten with these words.

Of particular note here is the editor module – which gives tight integration between Python and the editor. Others such as workflow round the integration out nicely.

There’s a nice little community for discussing Editorial (including with the author) at omz:software Forums.[5] I’m learning a lot from this community.

I’m also using Dropbox to keep posts and graphics I’m working on. This enables me to work on them across Linux, OSX and multiple iOS devices. (I have no reason or appetite to write on Windows.)

So, you can see my writing toolset is evolving, and no doubt it will continue to. By the way this is my personal view and experience, rather than an IBM endorsement. But I’m sure you realised that.

See As It Appens, Appening 1 – Note & Share on iOS and Appening 2 – Broken Sword Director’s Cut on iOS ↩
Ole Zorn, the author of Editorial, released Pythonista some months before. Pythonista is, as the name suggests, a Python programming environment for iOS. I like it as well but if Editorial had come along first I probably wouldn’t’ve bothered with it. ↩
Sometimes you can avoid Python, sometimes you can build a workflow with just a single Python stage, and sometimes you need to combine Python stages with other stages. ↩
Get it? 🙂 Hint: My German-speaking friends will groan at this bad pun and at “DreiPad”. 🙂 ↩
Currently the supported level of Python is Version 2. The community is, for example, debating the merits and method of getting to Version 3. ↩

Setting MEMLIMIT

(Originally posted 2014-04-30.)

I’ve been meaning to write about MEMLIMIT and its importance for some time.[1]

But it’s a moving target. So either there’s no good time to talk about it or lots of good times. 🙂 So let me discuss it now and then again later as necessary.[2]

So timing a post on MEMLIMIT is like choosing a wave to catch: In this case the wave was a discussion in IBM-MAIN about where to find out the MEMLIMIT for an address space, plus the way the MEMLIMIT value came to be set for that address space. Or maybe it’s the announcement of MQ Version 8, which does move the folklore on a bit.

So let me start by describing what MEMLIMIT is and why it’s important. Then I’ll talk about a few examples. Finally I’ll talk about instrumentation.

Why Is MEMLIMIT Important?

Nobody cares how much virtual storage you use, so long as you don’t cause real world effects.[3]

Real world effects include overuse of real memory – as it’s not free – and overcommiting real memory and, still worse, paging space.

MEMLIMIT is the mechanism for limiting an address space’s use of virtual storage. It can be set on the JCL EXEC statement, via an IEFUSI installation exit or in SYS1.PARMLIB(SMFPRMxx).[4]

One way or another you have to set MEMLIMIT for each address space:

Set it too low and the address space might refuse to start. Here I’m thinking of specific products.
Set it too high (or to Unlimited) and you create an exposure: Someone could create a vast 64-Bit Memory Object and touch every page, potentially causing the system to die.[5]

A good but old document on MEMLIMIT is Limiting Storage usage above the bar in z/Architecture by Riaz Ahmad.

Some Key Products

Let’s talk about CICS, DB2 and MQ.

CICS

In a typical CICS environment there are numerous CICS address spaces. In CICS 4.2 the minimum value of MEMLIMIT is 4GB, else the CICS region won’t start. In CICS 5.1 the minimum is 6GB.

In reality CICS will use what it needs to. But a region might in fact need more virtual storage, so do keep track of its usage: SMF 30 can help.

DB2

There are generally few DB2 subsystems. The only really big address space is the DBM1 address space. Nowadays the MEMLIMIT is set in the JCL as 4 Terabytes. Don’t change it: DB2 can be trusted not to use a threatening amount of real memory, so long as you do due diligence on real memory.[6]

MQ[7]

Again there are relatively few MQ subsystems but usually the big address space is MSTR. In Version 7 the default MEMLIMIT was set at 2GB in the JCL.[8]

In Version 8 things can potentially change, and you might need to increase the MEMLIMIT value: While buffer pools are by default still 31 Bit you can choose to make them 64 Bit (and you can long-term page fix them). If you make them 64 Bit you will need to review MEMLIMIT:

I’d suggest adding the sizes of the 64 Bit buffer pools plus 10% [9] to 2GB and setting that as your new MEMLIMIT.

Actually the CHIN address space can be pretty big, particularly if you have a large number of external connections. But it remains 31 Bit.

Instrumentation

As you might expect , SMF 30 (all subtypes) has the eventual MEMLIMIT value (in MB) for the job step. It also has the method by which the MEMLIMIT value was established.

I report on both of these at the address space level. And looking at this data gives me a much better idea how installations manage MEMLIMIT in general. If I talk to you about your MEMLIMIT approach it’ll be from this data and I might well refer you to this blog post. And I’m certainly talking about this in the 2014 revision of “Life And Times Of An Address Space”.

In fact it is in the latest version of my “Life And Times Of An Address Space” presentation (see Recent Conference Presentations.) ↩
It’ll become necessary to when, for example, a product raises the minimum MEMLIMIT value necessary for its address spaces to start. ↩
Not entirely true as presumably the work you’re running matters to somebody, but rhetorically close enough. ↩
The default is the SMFPRMxx value. ↩
Am I foolhardy in mentioning this? I would say I wasn’t as most installations have sensible limits in place to prevent this scenario. Please check yours and fix it if need be – and then my purpose in mentioning it will’ve been served. ↩
This is rapidly becoming understood by z/OS and DB2 Performance people. Perhaps I should write a separate post on this some time. ↩
My thanks to Matthew Leming of MQ on z/OS Development for the information herein. ↩
In fact there was a small amount of 64 Bit exploitation but 2GB is generally sufficient. ↩
The additional 10% is for the buffer pool control blocks, which are above the bar whether the buffer pools themselves are or aren't. ↩