Anatomy Of A Great App

This post follows on from Anatomy Of A Great iOS App.

That post was only written in 2019 but such a lot has changed in the Apple ecosystem that I think it worth revisiting. A hint at what’s changed is that the title of this post doesn’t contain the word “iOS” anymore.

(I can’t insert the word “Apple” into the title as the vast majority of the relevant apps aren’t made by Apple.)

As with that post, I don’t consider this one to be a complete treatment of what the ideal app would do. Just some pointers to the most important things. (And important is highly subjective, particularly as I consider myself a power user.)

To reprise a list in that post, with some updates, there are obvious personal biases here:

  • Automation is still important to me.
  • I have most of the Apple ecosystem – and now I have 3 HomePod speakers, scattered about the house.
  • I really want good quality apps – and I am willing and able to pay for them. (and, as in the case of OmniFocus 4, risk all by beta’ing them.) 🙂

Other themes are emerging:

  • Apps should be cross platform – where possible.
  • Mac Apps should support Apple Silicon.
  • Terms and conditions should be helpful.

All of the above are about user experience and value. So let’s take them one at a time.

Cross Platform

The tools and techniques increasingly support writing for all platforms with a single code base. Maybe with a small amount of platform-specific code.

From a vendor point of view this increases their market. You might say the Mac market, for instance, is small compared to the iPhone or Windows market. But only a small portion of iPhone users are into paying for apps, at least productivity apps. So the Mac market is a substantial proportion of that sub-market – and so probably worth catering for.

From a user point or view there are benefits, too: Portability of data and application experience are important. For example, I’m composing this blog post on an iPhone (in interestitial moments), on my iPad with a Magic Keyboard, and on Mac. The app I’m using is the very excellent Drafts – which has common automation across all platforms. (I might’ve started this post by dictating to my Apple Watch – using Drafts – but I didn’t.)

My task manager, OmniFocus, has similar cross-platform portability of data, automation, and (in the latest beta) experience. That again makes it attractive to me.

Both the mind mapping tools I use – MindNode and iThoughts – are cross platform.

Note: I don’t want all the platforms to merge – as there are use cases and capabilities unique to each, such as Apple Pencil. Or Apple Watch.

Apple Silicon

It’s important to note that Apple Silicon has the same programming model – at the machine code / assembler level – as iPhones and iPads have always had. Indeed the iPad I’m typing this on has the same M1 processor as the first Apple Silicon Macs. (It’s all based on ARM – which I enjoyed programming in assembler for in the late 1980’s.)

Building for Apple Silicon yields tremendous speed and energy efficiency advantages – which the consumer would greatly appreciate. It also makes it easier to build cross-platform apps.

While applications that are built for Intel can run using Rosetta 2, that really isn’t going to delight users. Apps really should be native for the best performance – and why would you want anything else?

Terms And Conditions Apply

As with z/OS software, the model for paying for software has evolved.

There are a couple of things I rather like:

  • Family Sharing
  • Universal Licencing

By the way, it seems to me to be perfectly OK to use free apps in perpetuity – but the developer generally has to be paid somehow. So expect to see adverts. I view free versions as tasters for paid versions, rather than tolerating adverts.

Family sharing allows members of your family (as defined to Apple) to share apps, iCloud storage, etc. That includes in-app purchases. But the app has to enable it. It’s an economic decision but it does make an app much more attractive – if you have a “family” that actually wants to use it. (I have a family but the “actually wants to use it” bit is lacking.)

Universal Licencing is more of a developer-by-developer (or app-by-app) thing to enable. It’s attractive to me to have a licence that covers an app from Watch OS, through iOS and iPad OS, all the way to Mac OS. It means I can experiment with where to run something.

I would couple both the above licencing schemes to Subscriptions – where you pay monthly or annually for the licence. Some people dislike subscriptions but I’m OK with them – as I know the developer needs to be paid somehow. The link is that I won’t rely on apps that either aren’t developed or, more seriously, where the developer isn’t sustainable. So to recommend an app to a family member it has to meet that criterion. Likewise to bother using it on all the relevant platforms.

Conclusion

One controversial point is whether the apps have to be Native. So, many people don’t like apps built with Electron (a cross-platform framework that doesn’t work on iOS or iPad OS or (probably) Android). To me, it’s important how good an app is more than what it’s built with – though the two are related. And “good” includes such things as not being memory hogs, supporting Shortcuts and / or AppleScript, and supporting standard key combinations.

Mentioning Shortcuts in the previous paragraph, I would note the arrival of Shortcuts on Mac in Monterey. I’ve used this – as highlighted in Instant Presentations?. While functional, the app itself is awkward to use on Mac – so I recommend composing on iOS or iPad OS to the extent possible. With iCloud sync’ing the resulting shortcut should transfer to Mac. But even on iOS and iPad OS the Shortcuts experience is currently (November 2021) buggy. I expect it to improve.

One final thought: Running through this post and the previously-referenced one is a theme: The thoughtful adoption of new features. Examples include:

  • Shortcuts – if automation is relevant.
  • SharePlay – if the idea of sharing over FaceTime is meaningful.

Not to mention Safari Extensions and Widgets.

The words “relevant” and “meaningful” being operative here. It’s not up to me as a user to assess relevance or meaningfulness – but it is up to users to use their ingenuity when adopting apps.

And when I say “thoughtful adoption” that applies to users as well. There are many new capabilities in modern releases of iOS , iPad OS, and Mac OS. I would single out two very recent ones:

  • Live text – which recognises text in photographs and lets you do something useful with it. I’ve found this to work very well.
  • Quick Notes – though the target being Apple Notes is less useful to me. (I’m approximating it with automation for capture to Drafts.)

So I encourage users to explore new operating system releases, rather than just bemoaning the need to upgrade.

Instant Presentations?

For normal people making a presentation is as simple as opening PowerPoint and starting typing.

But I’m not normal. 🙂

I would like to start with a mind map and end up with a presentation – with Markdown as my preferred intermediary.

It doesn’t matter what the presentation is. Mine tend to be in one of the following categories:

  • Something incredibly informal, perhaps to share with colleagues.
  • A conference presentation.
  • Workshop materials.

And sometimes the material isn’t going to be a presentation. Or at any rate not just a presentation. Hence Markdown being my preferred medium – as it can be readily converted to eg HTML.

And I want a low friction way of creating a presentation the way I like to create it.

The Workflow I Was Aiming For

The ideal steps go something like this:

  1. Have an idea.
  2. Create a mind map with the idea in.
  3. Work on the mind map until it’s time to flesh out some slides.
  4. Generate the skeleton Markdown for the slides.
  5. Flesh out the slides.
  6. Create the presentation – as readable by PowerPoint.

Steps 5 and 6 are, of course, iterative.

How I Built The Tooling For The Workflow

Steps 1 – 3 are all in my head or using MindNode, a very nice multi-platform mind mapping tool.

Steps 5 and 6 are:

  • Use a text editor to work on the Markdown (with some extensions)
  • Use my mdpre and md2pptx open source tools – via the make utility – to generate pure Markdown and convert it to PowerPoint .pptx format.

Decent text editors and make enable those last two steps to be quick and frictionless.

Step 4 is the interesting one. Let’s look at it in more detail – including how it’s done:

  • I wrote a Shortcuts shortcut – that could run on iOS, iPad OS, or Mac OS (as of Mac OS 12 Monterey). It exports a mind map you select to Markdown, does a small amount of editing, and copies it to the clipboard. MindNode has a built in action to export to a number of formats, including Markdown. Which is why I’m favouring MindNode for this task.
  • I wrote a Keyboard Maestro macro that invokes the shortcut.
  • The same macro writes the Markdown out – to a directory and file you nominate.
  • It also creates a make file that invokes mdpre and then md2pptx.
  • It also copies some files – boilerplate Markdown and CSS – into place.

So, the whole thing is as automatic as it could be – with the user specifying only what they need to. And it runs very fast – and much more reliably than a human doing it.

Here is what a presentation outline looks like in MindNode.

I’ve deliberately used text that describes what the generated parts will be.

And here’s the Markdown the whole process generates.

=include metadata.md
=include standard.css

# Presentation title

## Section 1

### Slide A
* Bullet 1
* Bullet 2

### Slide B

## Section 2

### Slide C

As you can see, it’s plain text – which is what Markdown is. So you could use any text editor you like to work on this. And you can apply Git version control to it – which is often what I do for more formal presentations.

Actually the =include lines aren’t standard Markdown; They are what mdpre will recognise as instructions to include files. In this case both metadata.md and standard.css embed other files the same way.

Conclusion

One final thought: You might think that a mind map is overkill for a presentation but consider what a minimal mind map is: It’s just a single word, or maybe it’s just the title. MindNode, for one, makes it very easy to create just such a mind map. It really is very minimal indeed. And I would hope that any formal presentation would go through a structural process like mind mapping.

So, “Instant Presentation”? Well, not quite – but very close. And close enough to make it easy and quick to kick off another presentation – however formal or informal.

What’s The Use?

It’s “Sun Up” on Conference Season – and I have a new presentation.

It’s called “What’s The Use?” And it’s a collaboration with Scott Ballentine of z/OS Development.

It’s very much a “field guy meets product developer” sort of thing. It emerged from a conversation on IBM’s internal Slack system.

The idea is very simple: If a product developer codes the IFAUSAGE macro right, and if a customer processes the resulting SMF 89-1 and SMF 30 records right, good things can happen.

Two “ifs” in that:

  • Scott describes how developers could and should code the macro.
  • I give some examples of how customers might use the data to their advantage.

Of course, when we say “developer” we don’t necessarily mean IBM Developer (such as Scott) – as other software vendors could and should get this right.

And when we say “customer” it could be consultants (such as me) or outsourcers, as well as traditional customers.

So what’s the big deal?

Looking at it from the customer point of view, there are a number of things that can be yielded:

  • CPU when using the product. MQ does this, for example.
  • Names of things. Db2 and MQ both do this.
  • Connectivity. Connectors to both Db2 and MQ do this. And – to a lesser extent – IMS does this.

I’ve listed IBM products – which are the ones I’m most familiar with. One thing Scott brings to the party is how the IFAUSAGE macro works and can be used. Handily, he talks through the lifecycle of using the macro and we both talk through how that turns into stuff in SMF 89-1 and SMF 30 records. The point of the lifecycle is that any vendor could use this information to be helpful to their customers.

We’d like developers to get creative in how they use IFAUSAGE – whether they use it as a basis for billing our not. (At least one famous one doesn’t.) So, a plea: If you are a developer of software that has something approaching a subsystem, consider encoding the subsystem name in Product Qualifier in IFAUSAGE. Likewise for any connector.

We now have a third public booking for this presentation (plus a private one). So I guess the presentation “has legs” – and we’ll continue to put in the effort to evolve it.

Talking of which, the presentation got its first outing yesterday and got numerous questions. One of them prompted Scott and I to discuss expanding the scope a little to cover SMF 89-2. Would that be worthwhile? (My inclination is that it would – and I already process 89-2 in my REXX so could furnish an example of what you might get.)

The abstract is in an earlier blog post: Three Billboards?.

One final note: We think this presentation could be useful enough that I think we’d be prepared to give it to smaller audiences – such as individual software developers.

Periodicity

When I examine Workload Manager for a customer a key aspect is goal setting. This has a number of aspects:

  1. How work gets classified – to which Service Class and Report Class by what rules.
  2. What the goals are for each Service Class Period.
  3. What the period boundaries should be.

This post focuses on Aspect 3: Period Boundaries.

What Is A Service Class Period?

When a transaction executes it accumulates service. Generally this is CPU time, especially with modern Service Definition Coefficients.

For transactions that support it you can define multiple Service Class Periods. Each period – except the last – has a duration.

Some transaction types, most notably CICS, only have a single period. For them the discussion of period durations is moot.

The z/OS component that monitors service consumption is System Resources Manager (SRM). SRM predates Workload Manager (WLM) by decades. (It’s important not to see WLM as replacing SRM but rather as supervising it. WLM replaces human-written controls for SRM.) Periodically SRM checks work’s consumption of resources. If the transaction has exceeded the relevant period duration the transaction moves to the next period.

It isn’t the case that a transaction using more service than its current period’s duration directly triggers period switch – so it would be normal for a (generally) slight exceeding as there is some slight latency to detection.

The purpose of multiple periods is, of course, to give good service to light consumers of service and to progressively slow down heavier consumers.

Note: A common mistake is to think that transactions fall through into later periods because of their elapsed time. They don’t; It’s about service. Granted, a long running transaction might be long running because of the CPU it’s burning. But that’s not the same thing as saying it’s the elapsed time that drove it to later periods.

Two Examples Of A New Graph

Here are two example graphs from the same customer. They are new in our code base, though Service Class Period ending proportions are something we’ve talked to customers about for many years. I’m pleased we have these as I think they will tell some interesting new stories. You’ll get hints of what I think those stories might be based on the two examples from my “guinea pig” customer.

Each graph plots transaction ending rates for each period of the Service Class across a day. In the heading is information about period boundaries and how many service units the transactions ending there consumed on average. I feel the usefulness of that latter will emerge with more experience – and I might write about it then. (And graph headings is one place my code has a high tendency to evolve, based on experiences with customers.)

Though the two examples are DDF I don’t intend to talk much about Db2 DDF Analysis Tool – except to say, used right, it would bring some clarity to the two examples.

DDFMED – A Conventional-Looking Service Class

This Service Class looks like many Production transaction service classes – with the classic “double hump” shape. I consider that an interesting – if extremely common – architectural fact. There’s something about this workload that looks reminiscent of, say, CICS transactions.

Quite a high proportion of the transactions end in Period 2 and a fair proportion in Period 3. Those in Period 3 are, on average, very heavy indeed – consuming an average of 162K service units. (This being DDF, the transaction ends when the work is committed – which might not be the end of the transaction from the client’s point of view.)

It seems to me the period boundaries are reasonable in this case, but see “Conclusion” below.

DDFLOW

This Service Class looks quite different:

  • The transaction rate is more or less constant – with two spikes, twelve hours apart. I consider both the constant transaction rate and the twelve-hourly spikes to be interesting architectural facts.
  • Almost all transactions end in Period 1. In fact well within Period 1. The very few Period 3 transactions are extremely long.

Despite the name “DDF Low” I think we have something very regular and well controlled here. I say “despite” as, generally, less well understood / sponsored work tends to be thought of as “low”.

Conclusion

I will comment that, when it comes to goal setting, business considerations play a big part. For example, some of the effects we might see at the technical level could be precisely what it needed. Or precisely what is not needed. So I tend not to walk in with recommendations for things like transaction goals – but I might walk out with them. Contrast this with what I call my “Model Policy” – which I discussed in Analysing A WLM Policy – Part 1 in 2013. Core bits of that are as close to non-negotiable as I get.

However, it is – as I think this post shows – very useful in discussions of period durations to know the proportions of transactions for a Service Class that end in each period. If everything falls through into Period 2, for example, Period 1’s duration is probably too short. And not just the proportions but the transaction rates across, say, the day.

One other thing, which I’ll leave as a question: What happens if you slow down a transaction that, say, holds lots of locks?

Four Coupling Facilities

This isn’t following on from Three Billboards? 🙂 but rather Shared Coupling Facility CPU And DYNDISP from 2016. I’m not sure it adds much that’s new but this set of customer data was an opportunity too good to miss…

… It enables me to graph most of the Coupling Facility LPAR types you’re likely to see.

I won’t repeat the contents of the 2016 post but I will repeat one thing: There are two views of Coupling Facility CPU:

  • SMF 70 Partition
  • SMF 74–4 Coupling Facility

That 2016 post talked at length about the latter. This post is more about the former.

Different Coupling Facility LPAR Types

There are different kinds of coupling facility LPAR, and this customer has several of them:

  • Dedicated
  • Shared – without Dynamic Dispatch (DYNDISP=NO)
  • Shared – with Dynamic Dispatch (DYNDISP=YES)
  • Shared – with DYNDISP=THIN

The latter two are similar but, in essence, Thin Interrupts (DYNDISP=THIN) shortens the time a CF spends polling for work, releasing the physical CPU sooner. This is good for other ICF LPARs, but maybe not so good for the LPAR with DYNDISP=THIN.

While this customer’s data doesn’t exemplify all four types it is a useful set of data for illustrating some dynamics.

About This Customer’s Mainframe Estate

I’m only going to describe the relevant bits of the customer’s mainframe estate – and I’m going to remove the names.

There are four machines, each running a mixture of LPARs in sysplexes and monoplexes. The sysplex we were most interested in had four LPARs, one on each machine. Also four coupling facilities, again one on each machine. There were no external coupling facilities.

Those of you who know a bit about resilience are probably wondering about duplexing coupling facility structures but this post isn’t about that.

I don’t think it makes any difference but these are a mix of z13 and z14 machines.

We had SMF 70-1 and SMF 74-4 from these z/OS LPARs and the four coupling facilities, but little from the others.

Here are the four machines’ ICF processor pools, across the course of a day.

The top two look significantly different to the bottom two, don’t they?

Machines A And B

These two machines have multiple ICF LPARs, each with some kind of DYNDISP turned on. We can see that because they don’t use the whole of their PR/SM shares – as their utilisation from the PR/SM point of view is varying.

Each machine has two shared ICF processors.

We have SMF 74-4 for the blue LPARs. So we can see they are using DYNDISP=THIN. We can’t see this for the other LPARs as we don’t have SMF 74-4 for them. (SMF 70-1 doesn’t have a DYNDISP indicator.)

The blue LPARs are also much busier than the other LPARs in their pool. While one might consider dedicating one of the two ICF processors we wouldn’t ideally define an ICF LPAR with a single logical processor.

Machine C

Machine C looks different, doesn’t it?

Again we have 2 physical processors in the ICF pool.

Here we have four ICF LPARs, each with DYNDISP=NO. We know three facts to establish this:

  • From SMF 70-1 we know that none of the LPARs has any dedicated ICF engines. We know this two ways:
    • As this graph shows, none of them has the share of a single ICF engine.
    • We have Dedicated and Shared processor information explicitly in SMF 70-1.
  • These LPARs use their whole share – judging by the constant CPU use in SMF 70-1.
  • From SMF 74-4 we know the blue LPAR in particular has DYNDISP=NO. (We don’t have SMF 74-4 for the others.)

Machine D

This looks similar to Machine C but it isn’t quite.

Yet again the ICF pool has 2 processors.

  • The Red LPAR has dedicated processors – from both SMF 70-1 and SMF 74-4. DYNDISP doesn’t even come into it for this LPAR.
  • The other LPARs have DYNDISP=NO, judging by their (SMF 70-1) behaviour.

A minor footnote: As I articulated in A Picture Of Dedication (in 2015) I sort the LPARs so the Dedicated ones appear at the bottom of the stack up. (Even below *PHYSICAL – which is a very thin blue veneer here but actually red in the case of the other three machines.)

But Wait, There’s More

When I started writing this post I thought it was just going to be about showing you four pretty pictures. But, partially because some blog posts get written over an extended period, something came up meanwhile that I think is worth sharing with you. Besides, electronic words are unlimited – even if your patience isn’t.

Having shown you some graphs that depict most of the ICF LPAR CPU situations I experimented with DYNDISP detection for ICF LPARs we don’t have SMF 74-4 for.

Got right, this could make the story of an ICF pool with shared logical engines much nearer to complete.

The current algorithm – now in our Production code – assumes DYNDISP if the ICF LPAR uses less than 95% of its share over the focus shift (set of hours). Otherwise it’s Dedicated or DYNDISP=NO. I still can’t tell whether it’s DYNDISP=THIN or YES without SMF 74-4.

While ideally I still want data from all z/OS LPARs and coupling facilities in a customer’s estate, this technique fills in some gaps quite nicely.

Well, it works for this customer, anyway… 🙂

Three Billboards?

You could consider presentations as advertising – either for what you’ve done or for what something can do. Generally my presentations are in the category of “Public Service Announcement”. Which rather combines the two:

  • What I’ve found I can do – that you might want to replicate.
  • What I’ve found a product can do – that you might want to take advantage of.

Sometimes it’s a “Public Health Warning” as in “you’d better be careful…”

Anyhow, enough of trying to justify the title of this post. 🙂

(I will say Three Billboards Outside Ebbing, Missouri is an excellent movie. I first saw it on an aeroplane on the usual tiny screen and it worked well even there.)

So, I have three presentations, two of which are brand new and the other significantly updated this year:

  • What’s The Use? With Scott Ballentine
  • zIIP Capacity And Performance
  • Two Useful Open Source Tools

I’ll post the abstracts below.

The first two are for IBM Technical University Virtual Edition (registration here). This is run by IBM and is premier technical training, including over 600 sessions on IBM IT Infrastructure.

As I’m a big fan of user groups I’m delighted to say the first and third are for GSE UK Virtual Conference 2021. In both cases these are “live” – which, frankly, I find more energising.

Scott and I recorded last week – as the Tech U sessions will be pre-recorded with live Question And Answer sessions at the end. So these are “in the can” – which is a big relief.

The third I’ve just started writing but I have a nice structure to it – so I’m sure it’ll get done, too.

So here are the abstracts…

What’s The Use – With Scott Ballentine

For many customers, collecting the SMF type 89 subtype 1 Usage Data records are an important and necessary part of Software Licencing, as they are used by SCRT to generate sub-capacity pricing reports.

But Usage Data has many more uses than that – whether in SMF 89 or SMF 30.

Customers can get lots of value out of this data – if they understand what the data means, and how it is produced.

Vendors can delight their customers – if they produce the right usage data, knowing how it can be used.

This presentation describes how Usage Data is produced, how a vendor can add value with it, and how customers can best take advantage of it.

zIIP Capacity And Performance

zIIP Capacity Planning tends to be neglected – in favour of General-Purpose Engines (GCPs). With Db2 allowing you to offload critical CPU to zIIPs, and the advent of zCX and z15 Recovery Boosts, it’s time to take zIIP capacity and performance seriously.

You will learn how to do zIIP capacity planning and performance tuning properly – with instrumentation and guidelines.

Two Useful Open Source Tools

You will learn how to use two open source tools many installations will find extremely useful, covering System and Db2 Performance:

Db2 DDF Analysis Tool – which uses Db2 SMF 101 Accounting Trace to enable you to manage DDF work and its impact on your system.

WLM Service Definition Formatter – which uses your WLM XML file to create some nice HTML, enabling you to see how the various parts of your Service Definition fit together.

Time For Action On Virtual Storage?

I wrote How I Look At Virtual Storage in 2014.

But since then my stance has shifted somewhat. A recent study I was part of made me realise my tone on z/OS virtual storage should probably be a little more strident. In that post I was somewhat even handed about the matter, just describing my method.

This study also involved the CICS Performance team. They pointed out that some of this customer’s CICS regions had substantial use of 24-Bit virtual storage. If those CICS regions were to be able to scale (individually) a limiting factor might be 24-Bit virtual storage.

Obviously working on CICS regions’ use of virtual storage is the correct way forward under these circumstances.

But it got me thinking, and doing a little investigating.

Why Virtual Storage Matters

Almost 35 years ago a sales rep in my branch suggested selling a customer more virtual storage. 🙂 (Yes, he really did.)

Joking apart, if you want more virtual storage you can’t buy it; You have to work for it.

But why would you want more 24-Bit virtual storage anyway?

  • Older applications might well be 24-Bit and converting them to 31- or even 64-Bit might be difficult or not economically viable.
  • Some things, such as access method buffers, might well be in 24-Bit virtual storage. (To be fair, high level languages’ support for buffers above the 16MB line is good.)

I raise the subject of access method buffers because one of the ways of tuning I/O performance is to increase the number of access method buffers. To optimally buffer a single QSAM data set, for example, takes a substantial proportion of an address space’s 24-Bit region – unless the buffers have moved above the 16MB line. So one is sparing of buffers under these circumstances, perhaps sacrificing some performance. (You would choose high I/O data sets to buffer, obviously.)

My own code – at least the bits I have the excuse of having inherited – is probably predominantly 24-Bit. I might fix that one day, though it doesn’t seem like a good use of time. (It would probably be coupled with removing SIIS (Store Into Instruction Stream) horrors.)

What’s New?

When my CICS colleagues mentioned managing the use of 24-Bit virtual storage within the CICS regions a thought occurred to me: I could examine whether the 24-Bit region size could be increased. Both are useful approaches – reducing the demand and increasing the supply.

The same is probably true of 31-Bit, of course. For 64-Bit I’m more concerned with getting MEMLIMIT set right.

I think I’ve observed something of a trend: It seems to me many customers have the opportunity to increase their 24-Bit region – probably by 1MB but sometimes by 2MB. This would be a handy 10 – 20% increase. I doubt many customers have examined this question in a while – though most have SMF 78-2 enabled all the time. It’s rare to get a set of customer data without it in – and we always pump out the report outlined in How I Look At Virtual Storage.

Examining And Adjusting Virtual Storage

In what follows I’m describing 24-Bit. Similar actions apply for 31-Bit.

  1. Check that your SQA definition is small enough that there is little or no SQA free. SQA can overflow into CSA but not vice versa – so overspecifying SQA is a waste of virtual storage.
  2. Check how much CSA is free.
  3. Adjust SQA and CSA so that there is generally 1MB CSA free but little if any SQA free, but so that you are raising the private boundary by an integer number of MB – probably 1MB, possibly 2MB.
  4. Monitor the resulting 24-bit virtual storage picture.

You need to do the analysis with data over a reasonable amount of time. I’d say at least a week. You want to know if SQA and CSA usage are volatile. In most customer situations I’ve seen they are relatively static, but you never know.

One of the key points is that the boundary between common and private is a segment boundary i.e. 1MB. There is therefore no point trying to claw back, say, half a megabyte. (The “generally keep 1MB free” above is not related to this; It’s just a reasonable buffer.)

For 24-Bit I said “adjust SQA and CSA so that there is generally 1MB free”. For 31-Bit I’ve generally said “keep 100MB free”. I think that’s fair but 200MB might be better. Most middleware and applications that use Below The Bar virtual storage have substantial amounts of 31-Bit. If their use is volatile – again when viewed over a reasonable period of time – then that might guide you towards 200MB free or more. The “100MB” is not structural; That’s 100 segments and is just a reasonable rule of thumb. At least for 31-Bit the 1MB segment boundary doesn’t seem nearly so coarse grained as it does for 24 Bit, so adjustment needn’t be in 100MB increments.

Talking of 31-Bit virtual storage, one of the biggest users of ECSA usage has been IMS. Until a few years ago the biggest users of 31-Bit private were Db2 and MQ. Db2 is now almost all 64-Bit and MQ has done lots of 64-Bit work.

Conclusion

I would urge customers to examine their virtual storage position. In particular, 24-Bit might well contain good news. 31-Bit might also, but there I’m more inclined to check there’s enough left. It’s not difficult to examine this, even if all you do is run off an RMF Virtual Storage Activity report.

I, for one, will make a point of examining the virtual storage reports we produce. Generally I do, anyway. Else I wouldn’t be seeing this trend.

Finally, a sobering thought: If, as has been said, customers are using 1 bit of addressability a year we are 21 years into a 33-year 64-Bit lifespan. I’m pre-announcing nothing by saying this. And the “1 bit a year” thing might not actually hold. But it does make you think, doesn’t it.

Another Slice Of PI

This post follows on from my 2019 post A Slice Of PI. Again, not Raspberry Pi but Performance Index PI. 🙂

I’ve become sensitised to how common a particular Workload Manager (WLM) problem is since I created some handy graphs.

Some Graphs Recent Customers Have Seen

When I examine a customer’s WLM Service Definition part of what I do is examining SYSTEM Workload, and then successive importances.

(SYSTEM Workload is mainly SYSTEM and SYSSTC Service Classes, of course. And if these are showing a drop off in velocity you can imagine what is happening to less important work. This is when I consider Delay For CPU samples – to figure out how much stress the system is under. But I digress.)

By “successive importances” I mean:

  1. Graph importance 1 Performance Index (PI) by Service Class by hour.
  2. Graph importance 2 Performance Index (PI) by Service Class by hour.

And so on.

Sometimes there is little work at eg Importance 1, so I might stop after Importance 2 or I might not. (Service definitions vary in what is at each importance level – and I graph CPU for each importance level to understand this.)

The above approach gives me some orientation. And it has shown up a phenomenon that is more common than I had supposed: Percentile goals where the service class period ends up with a PI of 0.5.

The Top And Bottom Buckets

Recall the following from A Slice Of PI: For a Percentile Goal when an individual transaction ends its response time relative to the goal is used to increment the transaction count in one of 14 buckets. There are two buckets of especial interest:

  • Bucket 1 – where transactions whose response times are no more than 50% of the goal time are counted.
  • Bucket 14 – where transactions whose response time are at least 400% of the goal time are counted.

These buckets feed into the Performance Index (PI) calculation. Let’s deal with Bucket 14 first:

Imagine a goal where the goal is “85% in 15ms”. If fewer than 85% of transactions end with response times short enough to land them in Buckets 1 to 13 the PI is 4. We don’t have much clue as to quite how long the more than 15% in Bucket 14 took, but we know they all took at least 4 × 15 = 60ms. So we don’t know how much to adjust the goal by. (Of course, we might want to tune the transactions or the environment instead – and we don’t know how much to speed things up by.)

A note on terminology: I’m going to use the term goal percent for the “85%” in this example. I’m going to use the term goal time for the “15ms” part.

The Bottom Bucket

Bucket 14 was the “warm up act” for Bucket 1.

With our “85% in 15ms” goal a transaction ending in Bucket 1 has a response time of no more than 0.5 × 15 = 7.5ms. Again, we don’t know how close to the bucket boundary the transaction response times tend to be. Here are two possibilities (and there are, or course, myriad others):

This first graph shows a transaction distribution where the transactions have response times tending towards way shorter than 50% of the goal time.

This second graph shows a transaction distribution where the transactions have response times tending towards just short of the 50% of goal time marks.

Both could have a PI of 0.5.

So what should we adjust the goal time to? I think we should be prepared to be iterative about this:

  • Set the goal time to somewhat less than the current goal time. Maybe as aggressively as 50% of the current goal time – as that is safe.
  • Remeasure and adjust the goal time – whether up or down.
  • Iterate

I think we have to get used to iteration: Sometimes, from the performance data, the specialist doesn’t know the final value but does know the direction of travel. This is one of those cases, as is Velocity.

In most cases we should be done in a very few iterations. So don’t be unimpressed by a specialist having the maturity to recommend iteration.

An alternative might be to tighten the goal percent, say to 95%. This has a couple of issues:

  1. it tells us nothing about distribution.
  2. It is more or less problematic, depending on the homogeneity of the work. At the extreme, if all transactions have the same measured response time adjusting the percent is a blunt instrument.

Problem 2 is, of course, more serious than Problem 1.

Of Really Short Response Times

Really short response times are interesting: Things being really fast is generally a good thing, but not always.

Consider the case of JDBC work where Autocommit is in effect. Here every single trivial SQL statement leads to a Db2 Commit and hence a transaction ending. This can lead to extremely short response times. More to the point, it’s probably very inefficient.

(You can observe this sort of thing with Db2 DDF Analysis Tool and Db2 Accounting Trace (SMF 101).)

But transactions have become faster. It’s not uncommon for goals such as our example (85% in 15 ms) to be lax.

Faster against the expectations of WLM goal setters. In the DDF case it’s quite likely such people won’t have been told what to expect – either in terms of performance or how to classify the work. Or, fundamentally, what the work is.

I see a lot of DDF Service Classes with 15ms for the goal time. For many years you couldn’t set it lower than that. Relatively recently, though, the lower limit was changed to 1ms. I’m not suggesting a transaction response time of 1ms is that common, but single digit is increasingly so.

So this 1ms limit stands to fix a lot of goal setting problems – both for percentile and average response time goal types.

The analogy that comes to mind is 64-Bit: We don’t – for now – need all 64 bits of addressability. But we certainly needed many more than 31. We might not need 1ms right now but we needed considerably less than 15ms. (I would guess some technological improvement meant this became feasible; I really should ask someone.)

Why Is A PI Of 0.5 A Problem?

You would think that doing better than goal would be a good thing, right?

Well, not necessarily:

  1. Not what was contracted for
  2. Not protective

Between the two of them there is a Service Delivery hazard: When things slow down an overly lax goal won’t stop this work from slowing down. And this is where Service Level Expectation – what we’ve delivered so far – clashes with Service Level Agreement – what the users are entitled to.

I often like, when we have a little time in a customer workshop, to ask where the goal come from. Perhaps from a contract, whether internal or external. Or maybe it’s the originally measured response time. Or maybe it’s just a “Finger in the air”. This all feeds into how appropriate the goal now is, and what to do about it.

Walking This Back A Tiny Bit

It’s not hopeless to figure out what’s going on if everything ends in Bucket 1, with a PI of 0.5. You can – again from Workload Activity Report (SMF 72-3) data – calculate the average response time. If the average is well below the “50% of goal time” mark that tells you something. Equally “not much below” tells you something else.

So, I think you should calculate the average, even though it’s not that useful for goal setting.

Conclusion

You probably know I’m going to say this next sentence before I say it: Revisit your WLM Service Definitions – both goal values and classification rules – on a regular basis.

And, now that I have PI graphs for each importance level, I’m seeing a lot of “PI – 0.5” situations. Hence my concentrating on the bottom bucket.

One final thing: With Db2 DDF Analysis Tool I am able to do my own bucketing of DDF response times. I can use whatever boundaries for buckets I want, and as many as I want. This helps me – when working with customers – to do better than the “14 buckets”. Or it would if I weren’t lazy and only had 10 buckets. 🙂 I can do the same with CPU time, and often do.

Hopefully this post has given you food for thought about your percentile goals.

One Service Definition To Rule Them All?

This is one of those posts where I have to be careful about what I say – to not throw a customer under the bus with over-specificity. So, bear with me on that score.

Actually it’s two customers, not one. But they have the same challenge: Managing Workload Manager (WLM) service definitions for multiple sysplexes.

These two customers have come to different conclusions, for now. I say “for now” because, in at least one case, the whole design is being reconsidered.

Some Basics

Even for those of us who know them well, the basics are worth restating:

  • A sysplex can have one and only one service definition.
  • A service definition cannot be shared between sysplexes, but it can be copied between them. These are separate installations and activations.
  • In a sysplex different systems might not run work in all the service classes.
  • The ISPF WLM application has been used since the beginning to edit service definitions.
  • You can print the service definition from the ISPF WLM application.
  • Relatively recently the z/OSMF WLM application became available – as a more modern (and more strategic) way of editing service definitions.
  • You can export and import the service definition as XML – in both the ISPF and z/OSMF applications.

What I’ve Been Doing Recently With WLM Service Definitions

For many years I relied entirely on RMF Workload Activity SMF records (SMF 72-3) when tuning WLM implementations. This simply wasn’t (good) enough, so a fair number of years ago I started taking WLM service definitions in XML form from customers when working with them.

Why wasn’t RMF SMF good enough? Let’s review what it buys you:

  • It tells you how service class periods behave, relative to their goals and with varying workload levels.
  • It allows you to examine report class data, which often is a good substitute for SMF 30 address space information.

So I’ve done a lot of good work with SMF 72-3, much of which has surfaced as blog posts and in conference presentations. So have many others.

But it doesn’t tell you what work is in each service class, nor how it got there. Likewise with report classes. Admittedly, SMF 30 will tell you an address space’s service and report classes, but it won’t tell you why. Likewise, SMF 101 for DDF work will give you a service class (field QWACWLME) but it won’t tell you why, and it won’t tell you anything about report classes. And SMF won’t tell you about history, always a fascinating topic.

To understand the structure of the service definition you need the XML (or the print, but I don’t like the print version). So that’s what I’ve been looking at, increasingly keenly.

A Tale Of Two Customers

  • Customer A has multiple sysplexes – on two machines. They sent me multiple XML service definitions – as they have one per sysplex.
  • Customer B has multiple sysplexes – on more than two machines. They sent me a single XML service definition – so they are propagating a single service definition to multiple sysplexes.

As it happens, in both customers, there are application similarities between some of their sysplexes. In Customer B there are different application styles between the LPARs in a single sysplex – and this fact will become important later in this discussion.

So, which is right? The rest of this post is dedicated to discussing the pros and cons of each approach.

Multiple Service Definitions Versus A Single Propagated One

The first thing to say is if you have an approach that works now think carefully before changing it. It’ll be a lot of work. “A lot of work” is code for “opportunity for error” rather than “I’m too lazy”.

If you have one service definition for each sysplex you might have to make the same change multiple times. But this would have to be an “egregious” change – where it’s applicable to all the sysplexes. An example of this would be where you’d misclassified IRLM address spaces and now needed to move them all into SYSSTC (where they belong). There’s nothing controversial or specific about this one, and not much “it depends” to be had.

For changes that are only relevant or appropriate to one sysplex this is fine. For example, changing the duration of DDFHI Period 1 for a Production sysplex.

But if you had a DDFHI in each of several sysplexes this could become confusing – if each ended up with a different specification. Confusing but manageable.

If you had one service definition for a group of sysplexes you’d make a change in one sysplex and somehow propagate that change around. You’d use literally the same XML file (assuming you go the XML route), export it from the first sysplex, and import it into the rest. The mechanism, though, is less important than the implications.

Any change has to be “one size fits all”. Fine for the IRLM example above, but maybe not so good for eg a CICS Region Service Class velocity change. In the “not so good” case you’d have to evaluate the change for all the sysplexes and only if it were good (enough) for all of them make the change.

But, you know, “one size fits all” is a problem even within a sysplex: Customer B has different applications – at quite a coarse-grained level – in different LPARs within the same sysplex:

  • The common componentry in the LPARs – e.g Db2 or MQ – probably warrants the same goals.
  • The application-specific, or transaction management, componentry – e.g. Broker, CICS, IMS, Batch – probably doesn’t.

There are some ways of being “differential” within a service definition. They include:

  • Treating one transaction manager differently from another.
  • Classification rules that are based on system.
  • Classification rules that are based on sysplex.

That latter – classification rules that are based on sysplex – is something I addressed very recently in my sd2html WLM Service Definition XML Formatter open source project. If you run it there’s a “Service Definition Statistics” table near the top of the HTML it produces. If there are classification rules based on sysplex name the specific sysplexes are listed towards the bottom of the table. This will save me time when looking at a customer’s service definition. You might use it , for example, to check whether such rules are still in place – when you thought they’d been removed. The specific use of the “Sysplex Name” qualifier will appear in either the Classification Groups or Classification Rules table, depending.

(I could use a good XML comparer, by the way. Because today I can’t easily tell the differences between Service Definition XML files – and I don’t think I should be teaching sd2html to do it for me.)

Conclusion

There is no conclusion, or at least no general firm conclusion; You really have to think it through, based on your own circumstances. Most notably two things:

  • Whether “one size fits all” works for you.
  • Whether operational procedures in WLM service definition management are effective.

But, still, it’s an interesting question. And it’s not hypothetical:

  • Plenty of customers have separate Sysprog, Development, Pre-Production, and Production sysplexes. In fact I’d encourage that.
  • Quite a few customers have separate Production sysplexes, especially outsourcers.

And this is why I find “Architecture” such fun. And why WLM is anything but simple – if you do it right.

Mainframe Performance Topics Podcast Episode 29 “Hello Trello”

This was a fun episode to make, not least because it featured as a guest Miroslava Barahona Rossi. She’s one of my mentees from Brasil and she’s very good.

We made it over a period of a few weeks, fitting around unusually busy “day job” schedules for Marna and I.

And this one is one of the longest we’ve done.

Anyhow, we hope you enjoy it. And that “commute length” can be meaningful to more of you very soon.

Episode 29 “Hello Trello” long show notes.

This episode is about how to be a better z/OS installation specialist, z/OS capture ratios, and a discussion on using Trello. We have a special guest joining us for the performance topic, Miroslava Barahona Rossi.

Follow up to Waze topic in Episode 8 in 2016

  • Apple Maps Adds Accident, Hazard, and Speed Check Reporting using the iPhone, CarPlay, and Siri to iOS.

What’s New

  • Check out the LinkedIn article on the IBM server changing for FTPS users for software electronic delivery on April 30, 2021, from using TLS 1.0 and 1.1 to using TLS 1.2, with a dependency on AT-TLS.

  • If you are using HTTPS, you are not affected! This is recommended.

Mainframe – Being a Better Installation Specialist

  • This section was modeled after Martin’s “How to be a better Performance Specialist” presentation. It’s a personal view of some ideas to apply to the discipline.

  • “Better” here means lessons learned, not competing with people. “Installation Specialist” might just as well have used “System Programmer” or another term.

  • There’s definitely a reason to be optimistic about being a person that does that Installation Specialist or System Programmer type of work:

  • There will always be a need for someone to know how systems are put together, how they are configured, how they are upgraded, and how they are serviced…how to just get things to run.

    • And other people that will want them to run faster and have all resources they need for whatever is thrown at them. Performance Specialists and Installation Specialists complement each other.
  • Consider the new function adoption needs too. There are so many “new” functions that are just waiting to be used.

    • We could put new functions into two catagories:

      1. Make life easiesr: Health Checker, z/OSMF, SMP/E for service retrieval

      2. Enable other kinds of workload: python, jupyter notebooks, docker containers

    • A good Installation Specialist would shine in identifying and enabling new function.

      • Now more than ever, with Continuous Delivery across the entire z/OS stack.

      • Beyond “change the process of upgrading”, into “driving new function usage”.

      • Although we know that some customers are just trying to stay current. Merely upgrading could bring new function activation out of the box, like SRB on z15.

    • A really good installation specialist would take information they have and use it in different ways.

      • Looking at the SMP/E CSIs with a simple program in C, for example, to find fixes installed between two dates.

      • Associating that with an incident that just happened, by narrowing it down to a specific element.

      • Using z/OSMF to do a cross-GLOBAL zone query capability, for seeing if a PE was present within the entire enterprise quickly.

      • Knowing what the right tool is for the right need.

    • Knowing parmilb really well. Removing unused members and statements. Getting away from hard-coding defaults – which can be hard, but sometimes can be easier (because some components tell you if you are using defaults).

      • Using the DISPLAY command to immediately find necessary information.

      • Knowing the z/OS UNIX health check that can compare active values with the hardened used parmlib member.

    • Researching End Of Service got immensely easier with z/OSMF.

    • Looking into the history of the systems, with evidence of merging two shops.

      • Could leave around two “sets” of parmlibs, proclibs. Which might be hard to track, depending on what you have such as ISPF statistics or comments. Change management systems can help.

      • Might see LPAR names and CICS region naming conventions

    • Might modern tools such as z/OSMF provide an opportunity to rework the way things are done?

      • Yes, but often more function might be needed to completely replace an existing tool…or not.
    • You can’t be better by only doing things yourself, no one can know everything. You’ve got to work with others who are specialists in their own area.

      • Performance folks often have to work with System Programmers, for instance. Storage, Security, Networking, Applications the list goes on.

      • Examples are with zBNA, memory and zIIP controls for zCX, and estate planning.

    • Use your co-workers to learn from. And teach then what you know too. And in forums like IBM-MAIN, conferences, user groups.

    • Last but not least, learn how to teach yourself. Know where to find answers (and that doesn’t mean asking people!). Learn how to try out something on a sandbox.

Performance – Capture Ratio

  • Our special guest is Miroslava Barahona Rossi, a technical expert who works with large Brasilian customers.

  • Capture Ratio – a ratio of workload CPU to system-level CPU as a percentage. Or, excluding operating system work from productive work.

    • RMF SMF 70-1 versus SMF 72-3
      • 70-1 is CPU at system and machine level
      • 72-3 is workload / service class / report class level
      • Not in an RMF report
  • Why isn’t the capture ratio 100%?

    • There wouldn’t be a fair way to attribute some kinds of CPU. For example I/O Interrupt handling.
  • Why do we care about capture ratio?

    • Commercial considerations when billing for uncaptured cycles. You might worry something is wrong if the capture ratio is low.

    • Might be an opportunity for tuning if below, say, 80%

  • What is a reasonable value?

    • Usually 80 – 90. Seeing more like 85 – 95 these days. It has been improved because more of I/O-related CPU is captured.

    • People worry about low capture ratios.

    • Also work is less I/O intensive, for example, because we buffer better

  • zIIP generally higher than GCP

  • Do we calculate blended GCP and zIIP? Yes, but also zIIP separately from GCP.

  • Why might a capture ratio be low?

    • Common: Low utilisation, Paging, High I/O rate.

    • Less common: Inefficient ACS routines, Fragmented storage pools, Account code verification, Affinity processing, Long internal queues, SLIP processing, GTF

  • Experiment correlating capture ratio with myriad things

    • One customer set of data with z13, where capture ratio varied significantly.

      • In spreadsheet calculated correlation between capture ratio and various other metrics. Used =CORREL(range, range) Excel function.

      • Good correlation is > 85%

      • Eliminate potential causes, one by one:

        • Paging, SIIS poor correlation

        • Low utilisation strong correlation

      • It has nothing much to do with machine generation. The same customer – from z9 to z13 – always had a low capture ratio.

        • It got a little bit better with newer z/OS releases

        • Workload mix? Batch versus transactional

      • All the other potential causes eliminated

      • Turned out to be LPAR complexity

        • 10 LPARs on 3 engines. Logical: Physical was 7:1, which was rather extreme.

        • Nothing much can be done about it – could merge LPARs. Architectural decision.

    • Lesson: worth doing it with your own data. Experiment with excluding data and various potential correlations.

  • Correlation is not causation. Look for the real mechanism, and eliminate causes one by one. Probably start with paging and low utilisation

  • Other kinds of Capture Ratio

    • Coupling Facility CPU: Always 100%, at low traffic CPU per request is inflated.

    • For CICS, SMF 30 versus SMF 110 Monitor Trace: Difference is management of the region on behalf of the transactions.

    • Think of a CICS region as running a small operating system. Not a scalable thing to record SMF 110 so generally this capture ratio is not tracked.

  • Summary

    • Don’t be upset if you get a capture ratio substantially lower than 100%. That’s normal.

    • Understand your normal. Be aware of the relationship of your normal to everybody else’s. But, be careful when making that comparison as it is very workload dependent.

    • Understand your data and causes. See if you can find a way of improving it. Keep track of the capture ratio over the months and years.

Topics – Hello Trello

  • Trello is based on the Kanban idea: boards, lists, and cards. Cards contain the data in paragraphs, checklist, pictures, etc.

  • Can move cards between lists, by dragging.

  • Templates are handy, and are used in creaing our podcast.

  • Power-ups add function. A popular one is Butler. Paying for them might be a key consideration.

  • Multiplatform plus web. Provided by Atlassian, which you might know from Confluence wiki software.

    • Atlassian also make Jira, whch is an Agile Project Management Tool.
  • Why are we talking about Trello?

    • We moved to it for high level podcast planning

      • One list per episode. List’s picture is what becomes the cover art.

      • Each card is a topic, except first one in the list is our checklist.

    • Template used for hatching new future episodes.

      • But, we still outlining the topic with iThoughts
    • We move cards around sometimes between episodes.

      • Recently more than ever, as we switched Episode 28 and 29, due to the z/OS V2.5 Preview Announce.

      • Right now planning two episodes at once.

  • Marna uses it to organise daily work, with personal workload management and calendaring. But it is not a meetings calendar.

    • Probably with Jira will be more useful than ever. We’ll see.
  • Martin uses it for team engagements, with four lists: Potential, Active, Dormant, Completed.

    • Engagement moves between lists as it progresses

    • Debate between one board per engagement and one for everything. Went with one board for everything because otherwise micromanagement sets in.

    • Github projects, which are somewhat dormant now, because of…

    • Interoperability

      • Github issues vs OmniFocus tasks vs Trello Cards. There is a Trello power up for Github, for issues, branches, commits, and pulls requests mapped into cards.

      • However, it is quite fragile , as we are not sure changes in state reliably reflected.

    • Three-legged stool is Martin’s problem, as he uses three tools to keep work in sync. Fragility in automation would be anybody’s problem.

      • iOS Shortcuts is a well built out model. For example, it can create Treello cards and retrieve lists and cards.

        • Might be a way to keep the 3 above in sync
    • IFTTT is used by Marna for automation, and Martin uses automation that sends a Notification when someone updates one of the team’s cards.

      • Martin uses Pushcut on iOS – as we mentioned in Episode 27 Topics

      • Trello provides a IFTTT Trigger, or Pushcut provides a Notification service, which can kick off a shortcut also.

      • Encountered some issues: each list needed its own IFTTT applet. Can’t duplicate applets in IFTTT so it’s a pain to track multiple Trello lists, even within a single board.

    • Automation might be a better alternative to Power-ups, as you can build them yourself.

  • Reflections:

    • Marna likes Trello. She uses it to be more productive, but would like a couple more function which might be addded as it becomes more popular.

    • Martin likes Trello too, but with reservations.

      • Dragging cards around seems a little silly. There are more compact and richer ways of representing such things.

      • A bit of a waste of screen real estate, as cards aren’t that compact. Especially as lists are all in one row. It would be nice to be able to fill more of the screen – with two rows or a more flexible layout.

In closing

  • GSE UK Virtual Conference will be 2 – 12 November 2021.

On the blog

So It Goes