Mainframe Performance Topics Podcast Episode 24 "Our Wurst Episode"

(Originally posted 2019-06-18.)

You’ll have to pardon the pun in our latest podcast episode’s title.

We were also somewhat delayed in getting this out – due to our busy schedules and a few technical gremlins. Hopefully it’s worth the wait.

It’s also quite a long episode so if you listen to it on your commute you’ll have to ask your chauffeur to drive a little more slowly. 🙂

Episode 24 “Our Wurst Episode”

Here are the show notes for Episode 24 “Our Wurst Episode”. The show is called this because we both attended the IBM TechU in Berlin, Germany, and our Topics topic is our trip report.

Feedback

  • We have some feedback (again) based on our use of stereo. We now have glorious mono, based on those comments!

Follow up

What’s New

  • APAR OA55959: NEW FUNCTION – PDUU Support for HTTPS
    • AMAPDUPL: Problem Documentation Upload Utility
    • How you get a dump to IBM, can be compressed, optionally encrypted, and sectioned into smaller data sets
    • HTTPS is important in this because dump can contain sensitive information and FTP is not an acceptable solution for many customers
    • FTPS had issues with e.g. firewalls,
    • Doesn’t look like this option has been incorporated into z/OSMF Incident Log at this time
  • Tailored Fit Pricing for IBM Z
    • Enterprise Capacity Solution
    • Enterprise Consumption Solution
    • Both different from traditional rolling four hour average model
    • For Tailored Fit Pricing, all machines must be IBM z14 Models M01-M05 or ZR1, and at IBM z/OS V2.2, or higher
    • More information here.
    • In Episode 19 Performance topic we talked about Licence-Related Instrumentation.
  • Ask MPT
    • Danny Naicker asks, “In z/OS 2.4 CSA subpool key 8–15, is it usable for user defined applications?”
    • Answer: Prior to z/OS V2.4 User Key Common Storage was available, but it was turned off by default. The downside was no control over who could use it.
      • In base V2.4 that specific capability has gone (the old system-wide switch).
      • Question probably originates from need to still use User-key CSA because of legacy stuff
      • This is where RUCSA (Restricted Use Common Service Area), a new function, comes into play. Allows you to identify applications by using a security definition.
      • Usage of RUCSA prior to V2.4 will need APAR OA56180
      • RUCSA will be offered in V2.4.
    • Thank you to Danny for a good question!

Mainframe Topic: CICS ServerPac in z/OSMF

  • IBM’s first delivery on new installation strategy, will be with CICS and associated SREL products. This is the first of many (really, all).
  • Choice on new installation strategy or old during ShopZ ordering. Choice is:
    • Old is ISPF CustomPac dialogs, or
    • New is z/OSMF Software Management and Workflows.
  • We encourage making the z/OSMF choice, as that is consistent between IBM and other vendors, and is intended to be easier.
  • Infrastructure already available in continuously delivery PTFs, and rolled back to z/OS V2.2. This makes the driving system have the proper infrastructure so anybody can package and deliver that way.
  • More details on the z/OS installation strategy:
    • Software vendors will package similarly, in a z/OSMF Portable Software Instance,
    • Clients will be able to acquire and deploy and configure using z/OSMF.
    • z/OSMF Software Management is used to the acquisiting and deployment. (“Deployment” is the new term for “installation”!)
    • z/OSMF Workflows is used for configuration. You would see the old ServerPac batch jobs as steps in a Workflow.
  • All software that you ordered as a ServerPac, and installed either way, will give you the same (or hopefully better) equivalent installation.
  • There is an IBM Statement Of Direction that this installation choice is coming, but we do not have an exact date yet.
  • For other software ISVs, they can exploit the new z/OS installation strategy whenever they are ready.
  • Prepare now by becoming familiar with z/OSMF Software Management and Workflows

Performance Topic: DB2 And I/O Priority Queuing

  • Follow on from Screencast / Blog post topic: Screencast 12 – Get WLM Set Up Right For DB2.
  • Recent talk has been about whether to turn off I/O Priority Queuing in WLM.
  • Service classes with DB2 subsystems in are heavily I/O Sample oriented, which is unusual among service classes in a system.
  • Means access to CPU is not properly managed, as CPU & zIIP samples few, relative to I/O samples. Reminder: Most of DBM1 is now zIIP-eligible.
  • Can achieve goal even with lots of delay for zIIP or CPU, but that’s definitely not what you want.
  • To see if it is properly managed:
    • See if there are lots of CPU / zIIP Delay samples in RMF Workload Activity.
    • In Db2 might well see Prefetch etc engines exhausted, which could cause unwanted Sync I/Os and bad SQL performance.
      • The effect is just like if there is a real zIIP shortage.
    • Instrumentation for DB2 of relevance is Statistics Trace.
  • You don’t want to just turn off WLM I/O Priority Queuing, as it’s sysplex-wide, it might affect other work that needs it, and Db2 might actually need it.
    • As the name suggests, it gives finer control over I/O priority.
    • So, it’s a case of proceeding with caution.
  • First you need a reasonably achievable goal for the service class. Make sure you’re more or less achieving the existing goal.
  • Second, calculate what the velocity achieved would be without I/O priority queuing .
    • Can take out the Using and Delay for I/O sample counts to do this.
  • If you don’t do the analysis and act on it a shift to not using I/O Priority Queuing could have unpredictable results.
  • You would know that turning off I/O Priority Queuing was helpful by seeing evidence that WLM is managing access to CPU for Db2 better, without hurting other stuff we care about. This evidence would come from RMF Workload Activity Report data.
    • On the Db2 side maybe Statistics Trace says Prefetch etc doesn’t get turned off. Or response times get better.
  • You should evaluate or adjust the goal attainment, but that is BAU. Changing WLM always needs some care.

Topics: Berlin Trip Report May 20–24

  • We both attended IBM Z TechU in Berlin, and got to see each other.
  • Marna had about six sessions.
    • The SMP/E Rookies session had fabulous attendance – 44. Some were more experienced, but most were not.
    • z/OSMF had good attendance too, about 82. More are interested in this topic, especially if you compare to just a couple of years ago.
    • Best attended was the z/OS V2.4 Preview, with about 150 people. There was excellent interest in what is coming in the new release.
  • Marna got to do a couple of things outside the conference:
    • Visiting the Reichstag was fabulous, but make sure to get a reservation.
    • Der Dom was also educational, with a walk to the top!
  • Marna did her own poster to help with z/OSMF configuration, and several people came by to chat.
  • Both Marna and Martin shared a poster about this podcast. We helped with getting one person a podcast app (on each platform), and a subscription to this podcast.
  • Martin had five sessions.
    • Two were with Anna Shugol, Engine-ering, and zHyperLink.
    • One was co-written with Anna, “2 / 4 LPARs”
    • Two were solo efforts: Parallel Sysplex Performance Topics, and Even More Fun With DDF
  • Martin also took a little time out of the conference
    • Each day took a session out to walk in the city.
    • It was interesting to wander round former East Berlin.
  • The next European IBM Z TechU is in Amsterdam May 25–29, 2020.

Customer requirements

  • RFE 131187
    • zOSMF RESTFILES PUT to remove Windows Carriage Return characters
    • Windows files contain a carriage return and line feed and the carriage return character x’0D’ is not being removed. The resulting zOS datasets therefore have a blank line after every data line that shouldn’t be there.

Future conferences where we’ll be

  • Both Marna and Martin in SHARE, Pittsburgh, August 5–9, 2019

On the blog

Contacting Us

You can reach Marna on Twitter as mwalle and by email.

You can reach Martin on Twitter as martinpacker and by email.

Or you can leave a comment below. So it goes…

Elementary My Dear Sherlock

(Originally posted 2019-06-08.)

Students of English literature won’t be alone in recognising the allusion in this post’s title.But this post isn’t about literature.[1].

Sherlocking, as described here is a phenomenon where a developer ships something – typically an app – but then Apple comes along and announces its own version of it.

In very recent memory, examples might be:

  • Enhancements to Apple’s Reminders apps on iOS and Mac OS versus Omnifocus and other “to do list” apps – in iOS 13 and Mac OS 10.15 Catalina.
  • NFC tags kicking off Apple Shortcuts automation versus LaunchCenter Pro’s support of NFC tags (described in Automation On Tap) – in iOS 13.
  • Apple Watch Calculator and PCalc – in Watch OS 6.

But what of it?

At first sight, having your app Sherlocked must be disheartening. But that’s not the end of the story.

What’s at risk for the vendor that gets Sherlocked is subscriptions and future sales. In other words, revenue. For some apps – particularly those that don’t use a subscription model – the sales pattern might be that most sales happen early in the app’s life. So before Sherlocking happens.

But it’s not that simple. Yes, Sherlocking represents a threat but it also represents an opportunity…

… A platform vendor legitimizes a marketplace by sensitising users to the value of a function. For example the new Watch OS Calculator app will introduce users to the idea of a calculator on their wrist.[2] But if PCalc were a better calculator than the Apple one it could still sell well. Fortunately it is. 😁

A built-in app gives one you have to buy a run for its money and free beats fee for many customers – if the basic function is good enough.

But the built-in (Sherlocking) app is usually relatively basic so a purchased app wins by differentiation from the basic. So the message is Sherlocked apps must up their game.

(For me, while I wouldn’t want to wear the “Power User” badge the basic function is rarely good enough. For example I use the excellent Overcast podcasting app instead of the Apple one and it’s inconceivable I wouldn’t use it and hardly conceivable I wouldn’t pay for the premium version.)

Having said that, first party apps have some additional opportunities for integration so could have some advantages. It’s difficult to compete against that. There are private APIs that only Apple can use – but those tend to open up over time.

Basic infrastructure – for example the Reminders database – done right allows third parties to use the infrastructure. Reminders data is shareable across platforms and with other third party apps designed to take advantage of the database.

A good example of a third party app using the Reminders database is Goodtask. Goodtask illustrates one downside of using the built in database, however: The developers had to use a lot of ingenuity to get round the limitations of the Reminders database: As they articulate here they use the Notes field for a task and had to invent their own metadata format.

Unfortunately Mail on iOS doesn’t have this open database and nor does Music.So email apps have to use their own databases – which is a real shame. It means, for instance, you can’t operate on the same email account with a mix of built-in and third party app functions. With Reminders and Goodtask you can.

Here’s an example of why I would want Mail to be open: My favourite email client – on Mac and iOS – is Airmail. I favour it because it has more automation hooks than the default Mail app. But I can’t use it with my work email account because it doesn’t share the default Mail app’s database.

Automation is one area where Apple – as the platform vendor – has an advantage. Those of us who are into automation are still waiting for better automation than x-callback-url[3] – as Shortcuts doesn’t provide a general automation mechanism for third-party apps. A more general mechanism would certainly help. But I digress.

In summary, Sherlocking does represent a threat but also an opportunity. The way to make it the latter is for Sherlocked app developers to continually innovate – to differentiate their apps from Apple’s. Thankfully the best developers are fleet of foot; I don’t envy their position but the best will survive.

If they do innovate at speed the net is the consumer benefits.


  1. Thankfully, given my background in Science and Technology. 🙂 But I’m not a complete philistine; Bits of me are missing. 🙂  ↩

  2. Not an entirely new idea, of course.  ↩

  3. I like x-callback-url so much I can often be seen sporting the t-shirt. 🙂  ↩

A Slice Of PI

(Originally posted 2019-06-01.)

A rather obscure pun, but I hope it’ll make sense. Not “Pi” by the way but “PI”. Though this post contains arithmetic, it’s not a mathematics post.

To be honest, I never knew how we calculated Performance Index (or PI) for Percentile goals before. But now I do, so I’m sharing it with you. Plus a couple of observations, too.

To be even more honest, when I say “I never knew how we calculated Performance Index” I should say “I never knew how we should calculate Performance Index” – as I’ve just corrected it.

(This post follows on (after 6 years) from WLM Response Time Distribution Reporting With RMF. More on that later.)

Before we go any further I have to give a little background information.

What Are Percentile Goals?

When Workload Manager (WLM) manages transactions explicitly two kinds of goals become available:

  • Average Response Time
  • Percentile

By the way, for WLM to manage transactions at all requires cooperation / exploitation by middleware or at any rate a work manager. Examples include:

  • CICS transactions require CICS support
  • DDF transactions require Db2 support
  • TSO uses the traditional transaction ending mechanism
  • Likewise Batch job steps

Average response time goals say something like “the average response time for this service class period should be 0.5 seconds”.

A Percentile response time goal might be “90% of all transactions in this service class period should finish in 300 milliseconds”.

What Is Performance Index – Or PI?

Performance Index (PI) is a measure of goal attainment.

  • A PI of 1.0 is where the goal is just met.
  • Higher than 1.0 and it is missed, the further away from 1.0 the worse the miss.
  • Lower than 1.0 and it is met, the lower the greater ease in meeting the goal.

The point about PI is that it is a metric for goal attainment that is neutral with regard to workload type.

PI is, of course, used to drive WLM’s algorithms. But I regard it as just the first metric. Others, such as WLM’s ability to help a service class period, are important too.

How Do We Calculate PI For Percentile Goals?

The calculation for goal attainment for Average response time goals is straightforward: Sum up the response times for each transaction and divide by the number of transactions ending.

The calculation for Percentile goals is more complex.

For any kind of transaction-based goal, at transaction ending WLM uses the transaction’s response time to assign it to one of 14 buckets. So WLM is counting transaction endings in these buckets.

The buckets have the following boundaries:

BucketMinimum
% Of Goal
Maximum
% of Goal
PI Value
10500.5
250600.6
360700.7
470800.8
580900.9
6901001.0
71001101.1
81101201.2
91201301.3
101301401.4
111401501.5
121502002.0
132004004.0
144004.0

The bolded values are of special significance, as we shall see.

Suppose we have a goal of “85% to complete within 0.2 seconds”. WLM knows how many transactions completed in each bucket and how many overall.

Suppose 1000 transactions completed. 85% of 1000 is 850 transactions.

Starting with Bucket 1, WLM tallies up the transaction endings until it meets 850 transactions. The upper limit of the bucket in which that happens is what determines the PI.

Suppose Buckets 1 to 3 tally up to 800 transactions and Bucket 4 contains 100 transactions. So Buckets 1 to 3 don’t meet 850 but Buckets 1 to 4 do.

Bucket 4’s upper limit is 80% of goal. So the PI is 80%/100 or 0.8.

Suppose it took Buckets 1 to 8 to reach or exceed 850. Then Bucket 7’s upper limit would be 110% and the PI would be 110%/100 or 1.1.

The code I inherited[1] didn’t do this calculation. But now it does.

Actually the calculation is not quite that simple: If by the time we’ve tallied up buckets 1 to 13 and we still haven’t reached that 850 number we set the PI to 4.0 (which makes sense).

Some Observations

From the above description of how PI is calculated for percentile goals, we can observe a few things:

  • PI can never be greater than 4.0, no matter how widely the goal is missed.

  • PI can never be less than 0.5 – as Bucket 1’s maximum is 50% of goal.

  • A PI of 0.5 is special in that it means enough transactions ended in Bucket 1, and some could be much shorter than 50% of goal. To get further definition we’d have to calculate the average response time.

    Now I calculate PI right I’m seeing 0.5 quite a bit.

  • If there are no transaction endings automatically the percentile goal is reached in Bucket 1 as all zero transactions ended there. So PI is meaningless with no transaction endings. so I force the PI to 0, but attempt to flag why in my reporting.

  • Because we’re using buckets there are only a finite number of values PI can take. Averaging over multiple RMF intervals will, of course, yield more.

Revisiting That Old Blog Post

This seems as good a place as any to follow up on WLM Response Time Distribution Reporting With RMF.

I made some refinements to the graph I showed there:

  • In the post I alluded to “near misses” versus “missed by miles” – and the counterpart on the “hits” side. I did indeed define “near” as +20% (and –20%) so Buckets 7–8 (and Buckets 5–6).
  • I added a green datum line for the % value in the goal.
  • I also added transaction rate. My code attempts to scale so that transaction rate doesn’t look ridiculous on a 0 to 100 scale.
  • I considered changing from a red-through-to-green spectrum but I think that’s less consumable. Besides, I like red/blue better.

Here is a modern case of a CICS transaction service class.

Here are some observations:

  • The datum is 95% because the goal is “95% in 1 second, Importance 1”.
  • A goal with “1 second” in it suggests pretty heavy CICS transactions. I’m not surprised there aren’t that many transaction endings.
  • When the transaction rate is significant the red pokes down below the green datum. Not just the pale red (“Just Outside”) but the darker red (“Well Outside”). You could say the blue shrinks away from it, if you prefer.
  • Not shown here but the PI is typically around 1.3.

By the way, WLM doesn’t have complete control over the response time achieved for a transaction. And that’s particularly relevant here.

This transaction goal service class is served by two region goal service classes. Both of these show almost no “Delay For X” samples. What they do have is lots of “Using I/O” and “Using CPU” samples.

So, to improve transaction response time it’s probably necessary to try:

  • Cutting the transaction CPU path length.
  • Reducing the I/O time by (and this is only an example) buffering the data better.

Neither of these are things WLM can do[2].


  1. And I hope I don’t sound defensive when I say that.  ↩

  2. This installation does not have Db2 so the “WLM-Managed Db2 Buffering” function doesn’t apply.  ↩

What's In A Name? – Revisited Again

(Originally posted 2019-05-27.)

It seems to be in the nature of my code development work that I revisit things over and over again. You could call it “agile”, you could call it “fragile”. 🙂 I prefer to think of it as being inspired by each fresh set of customer data.

And so it is with data set names. In What’s In A Name? – Revisited I talked about gleaning information from a data set name and using bolding to aid comprehension. This is an update to that, based on some new code.

But first a confession: I reopened this code because there was a bug in it. But while I was there I was inspired to capture another hill.[1] The very same dataset that caused my REXX to terminate with an error caused a nice piece of inspiration.

The Bad

The bug was in not recognising a Generation Data Group (GDG) data set could have “19” in its low-level qualifier. That led to misinterpreting the generation number as part of a date. (GDS low level qualifiers are of the form GnnnnVmm where the first variable part is the generation number and the second the version number – so “G1719V00” means “generation 1719, version 0”.)

That was an easy one to fix but on to the nicer part.

The Good

The data set name had “BKUP” as the last but one qualifier. This to my eyes signifies the data set is a backup for something. So I added a test to detect either “BKUP” or “BACK” in a qualifier and bold it if present.

I’ve made the code general enough so I can add further mnemonics – such as “OFFSITE” or “UNLOAD”. (In fact – since writing this post on the plane to Berlin I added full-qualifier matching for “OUT” and “NEW” as one of our job dossiers had data sets with these qualifiers.)

When I examine the data set names for one step in particular, every data set with “BKUP” as the low-level qualifier has a corresponding data set with an identical name, apart from the “BKUP” low-level qualifier being missing. The “BKUP” data set is an output data set (and I know about it because of its SMF Type 15 record). The matching data set is an input data set (and I know about it because of its SMF Type 14 record).

So I think we know what’s going on here[2]. 🙂

As I concluded in What’s In A Name? – Revisited there’s value in decoding data set names. And the more common gleanings I can do the better. This is just a nice little further step in that direction.

But I remain conscious of the possibility one could go too far, or get it wrong:

  • Not every word in the English language that appears inside a data set name is significant.
  • Not every significant word in a data set name is even English.

But we roll on. And no doubt the next study will bring fresh ideas for code. And that’s just the way I like it.


  1. Isn’t that always the way?  ↩

  2. To spell it out, some sort of backup processing.  ↩

Engineering – Part Two – Non-Integer Weights Are A Thing

(Originally posted 2019-05-26.)

Maybe you’ve never thought much about this but aren’t weights supposed to be integer?

Well, they are at the LPAR level. But what about at the engine level?

Let me take you through a recent customer example. The names are changed but the numbers are real, as the one graphic in this post will show. The customer has a z14 ZR1 with 3 general-purpose processors (GCPs). The weights add up to 1000. Nice and tidy. There are two LPARs:

  • PROD – with 3 logical engines and a weight of 965.
  • TEST with 2 logical engines and a weight of 35.

Both LPARs are in HiperDispatch mode – which means the logical engines are vertically polarised.

To proceed any further we need to work out what a full engine’s worth of weight is: It’s 1000 / 3 = 333.3 recurring. Clearly not an integer. How do you assign vertical weights given that?

Let’s take the easy case first:

TEST has a weight of 35. Much less than one engine’s worth of weight. It has two logical processors so we would expect:

  • A Vertical Medium (VM) with a weight of 35.
  • A Vertical Low (VL) with a weight of 0.

So, in this case, both the engines have integer weights. So far so good.

Now let’s take the case of PROD. Here’s what I expect:

  • Two Vertical Highs (VHs) each with a weight of 333.3 recurring. Total 666.6 recurring.
  • A Vertical Medium (VM) with weight 965 – 666.6 recurring or 288.3 recurring. (It’s the presence of the non-integer VH’s that forces the VM to be non-integer.)
  • No Vertical Lows (VLs).

When I say “expect” I really mean “what I’ve come to expect”. And I say that because I’ve seen it in reports produced by my code – and ended up wondering if my code was wrong. With the “Engine-ering” initiative, and in general because of HiperDispatch, it’s become more important to understand what’s going on at the logical engine level.

Non-integer weights began to worry me. So I started to investigate. Here’s the process, in strict step order:

  1. My REXX code correctly queries my database at a summary table level and reports what it sees.
  2. My database code correctly summarises the log level table.
  3. My log level table correctly maps the record.

Let’s take a closer look at the record, which is what I did to establish Point 3.

When I look at individual records at the bits-and-bytes level I generally use RMF’s ERBSCAN and ERBSHOW execs:

  1. If you type ERBSCAN against an SMF data set in ISPF 3.4 you get a list of records, each of which has a record number associated with it. Among other things the ERBSCAN list shows SMFID, timestamp and record type and subtype.
  2. If you type ERBSHOW nnn where nnn is the number of an RMF record you get a formatted hex display of the record.

I emphasise RMF because ERBSHOW does a good job on RMF records, but no so useful a job for most other record types. (SMF 99-14 is one where I’ve seen it do a good job, but I digress.)

Anyway, back to the point., Here’s part of an ERBSHOW for an SMF 70-1 record. It shows five Logical Processor Data Sections – the first 3 for PROD and the last 2 for TEST.

The highlighted field is SMF70POW – the engine’s vertical weight. Here’s the full description of the 4-byte binary field:

Polarisation weight for the logical CPU when HiperDispatch mode is active. See bit 2 of SMF70PFL. Multiplied by a factor of 4096 for more granularity. The value may be the same or different for all shared CPUs of type SMF70CIX. This is an accumulated value. Divide by the number of Dignoase samples (SMF70DSA) to get average weight value for the interval.

So the samples are multiplied by 4096. Now 4096 is 1000 hexadecimal. So an integer would end with three hex zeroes, wouldn’t it? The first three clearly don’t.

But lets take the simpler – TEST – case first.

  • SMF70DSA is 90 decimal.
  • Section 4 has hex 00C4E000.Dividing by hex 1000 and converting to decimal we get 3150. Divide that by 90 and we get 35. So this is the VM mentioned above.
  • Section 5 has zero so that is a vertical weight of 0. So this is the VL mentioned above.

Now let’s look at PROD.

  • Each of the first two logical engines has SMF70POW of hex 0752FFE2. Clearly dividing by 1000 hex doesn’t yield an integer – so I (and my code) divide by SMF70DSA first. I get hex 0014D555 or decimal 1365333. Divide this by 4096 and I get 333.3 recurring.
  • The third engine has SMF70POW of hex 068E203C. Divide by SMF70DSA and convert to decimal and I get 1221974 decimal. (Already this is less than 1365333.) Divide by 4096 and I get 298.3 recurring.

So my code is vindicated. Phew!

My suspicion is that vertical weights are held (not just sampled) multiplied by 4096.

But in any case the message is if the data looks odd then dig into it. In my case I blamed my own tools first but my tools are vindicated. But my expectation was wrong or, more charitably, blurry.

And, the more I think about it, the more the actual engine-level weights make sense. They have to add up to the LPAR weight. And the existence of Vertical Highs forces the above arithmetic on us.

But half the point of this post is to show how I debug numbers (and names) in my reporting that don’t meet my expectation. And ERBSCAN / ERBSHOW is a pair of friends you might like to get to know.

Engineering – Part One – A Happy Medium?

(Originally posted 2019-05-25.)

In Engineering – Part Zero I talked about the presentation that Anna Shugol and I have put together. That post described the general sweep of what we’re doing.

This post, however, is more specific. It’s about Vertical Medium logical processors.

To keep it (relatively) simple I’m describing a single processor pool. For example, the zIIP Pool. Everything here can be generalized, though it’s best to treat each processor pool separately.

Also note I use the term “engine” quite a lot. It’s synonymous with processor.

What Is A Vertical Medium?

Before HiperDispatch an LPAR’s weight was distributed evenly across all its online logical processors. So, for a 2-processor LPAR with weights sufficient for 1.2 processors, each logical processor would have 0.6 engines’ worth of weight.

Now let’s turn to HiperDispatch (which is all there is nowadays)1.

The concept of A Processor’s Worth Of Weight is an important one, especially when we’re talking about HiperDispatch. Let’s take a simple example:

Suppose a machine has 10 physical processors and the LPARs’ weights add up to 10002. In this case an engine’s worth of weight is 100.

In that scenario, suppose an LPAR has weight 300 and 4 logical processors. Straightforwardly, the logical processors are:

  • 3 logical engines, each with a full engine’s worth of weight. These are called Vertical Highs (VH for short). These use up all the LPAR’s weight.
  • 1 local engine, with zero weight. This is called a Vertical Low (or VL).

There are a few “corner cases” with Vertical Mediums, but let me give you a simple case. Suppose the LPAR, still with 4 logical processors, has weight 270. Now we get:

  • 2 VH logical engines, each with a full engine’s worth of weight. This leaves 70 to distribute.
  • 1 logical engine, with a weight of 70. This is not a full engine’s weight. So this kind of logical processor is called a Vertical Medium (or VM).
  • 1 VL logical engine, with zero weight.

Note that the VM in this case has 70% of an engine’s worth of weight.

How Do Vertical Mediums Behave?

There are two parts to HiperDispatch:

  • Vertical CPU Management
  • Dispatcher Affinity

Vertical CPU Management

Let’s take the three types of vertically polarized engines:

  • With a VH the picture is clear: The logical processor is tied to a specific physical processor. It is, in effect, quasi-dedicated. The benefit of this is good cache reuse – as no other logical engine can be dispatched on the physical engine. Conversely, the logical engine won’t move to a different physical engine (leaving its cache entries behind).

  • With a VM there is a fair attempt to dispatch a logical engine consistently on the same physical engine. But it’s less clear cut that this will always succeed than in the VH case. Remember a VM will probably be competing with other LPARs for the physical engine. So it could very well lose cache effectiveness.

  • With a VL, the logical engine could be dispatched anywhere. Here the likelihood of high cache effectiveness is reduced.

The cache effects of the three cases are quite different: It would be reasonable to suppose that a VH would have better cacheing than a VM, which in turn would do better than a VL. I say “reasonable to suppose” as the picture is dynamic and might not always turn out that way.

But you can see that LPAR design – in terms of weights and online processors – is key to cache effectiveness.

We prefer not to run work on VLs – so the notion of parking applies to VLs. This means not directing work to a parked VL. VLs can be parked and unparked to handle varying workload and system conditions.

Dispatcher Affinity

With Dispatcher Affinity, work is dynamically subdivided into queues for affinity nodes. An affinity node comprises a few logical engines of a given type. Occasionally work is rebalanced.

You could, for queuing purposes, view an LPAR as a collection of smaller units – affinity nodes – though it’s not as simple as that. But that could introduce imbalance, a good motivation for the rebalancing of work I just mentioned.

What Dispatcher Affinity means is that work isn’t necessarily spread across all logical processors.

How Do They Really Behave?

With VMs I have three interesting cases, two of which I have data for. They got me thinking.

  • Client A has an LPAR with 4 logical zIIPs. One is a VH, one is a VM with weight equivalent to 95% of an engine, and two are VLs. Here it was notable that there was reluctance to send work to the VLs – as one might expect. The surprise was that the VM was consistently loaded about 50% as much as the VH. For some reason there’s reluctance to send work there as well, but not as bad as to the VLs. The net effect – and why I care – is because the VH was loaded heavier than we would recommend, because of this skew.
  • Client B has two LPARs on a 3-way GCP-only machine. One has two VHs and one VM with almost a whole engine’s worth of weight. In this case the load was pretty even across the 3 logical engines, according to RMF.
  • Client C – for whom I don’t have data – are concerned because it is inevitable they’ll end up with 1 almost-VH logical engine.

So there’s some variability in behaviour. But that’s consistent with every customer environment being different.

Conclusion – Or Should We Avoid Vertical Mediums?

First, in many cases, there’s an inevitability about VMs, particularly for small LPARs or where there are more LPARs than physical engines. I’ll leave it as an exercise for the reader to figure out why every LPAR has to have at least one VH or VM in every pool in which it participates.

I don’t believe it makes any difference in logical placement terms whether a VM has 60% of an engine’s worth of weight or 95%. But I do think a 60% VM is more likely to lose the physical in favour of another LPAR’s logical engine than a 95% VM.

I do think it’s best to take care with the weights to ensure you don’t just miss a logical engine being a VH.

This thinking about Vertical Mediums suggests to me it’s useful to measure utilisation at the engine level – to check for skew. After all you wouldn’t want to have Delay For zIIP just because of skew – when the pool isn’t that busy.

But, of course, LPAR Design is a complex topic. So I would expect to be writing about it some more.


  1. Except under z/VM with HiperDispatch enabled I’m told you would want to turn it off for a z/OS guest. 

  2. Often I see “close but no cigar” weight totals, such a as 997 or 1001. I have some sympathy with this as events such as LPAR moves and activations can lead to this. Nonetheless it’s a good idea to have the total be something sensible. 

Engineering – Part Zero

(Originally posted 2019-03-22.)

I’m writing this on a plane, heading to Copenhagen. Planes, like weekends, give me time to think. Or something. 🙂

Ardent followers of this blog will probably wonder why there have been few “original content” posts to this blog1 recently.

Well, I’ve been working on an exciting project with my friend and colleague Anna Shugol. Now is the time to begin to reveal what we’ve been working on. We call this project “Engine-ering”2.

The idea is simple: There is real merit in examining CPU at the individual processor level, for example the individual zIIP. As one colloquial term for processor is “engine” it’s easy to end up with a title such as “Engine-ering” and the hashtag #EngineeringWorks is way too tempting not to deploy.

The project has three parts:

  • Writing some analysis code.
  • Deploying the code into real customer situations.
  • Writing a presentation.

These three are intertwined, of course. As we go on we will:

  • Write more code.
  • Gain more experience with it in customer situations.
  • Evolve our presentation.

You’d expect nothing less from us.

Traditional CPU Analysis

Traditionally, CPU has been looked at from a number of perspectives:

  • Machine and LPAR – with SMF 70-1.
  • Workload and service class – with SMF 72-3.
  • Address space – with SMF 30-2/3, also 4/5.
  • DB2 transaction – with SMF 101 – and its analogues for other middleware.
  • Coupling Facility – with SMF 74-4.

All of these have tremendous merit – and I’ve worked with them extensively over the years.

z/OS Engine Level

Our idea is that there is merit in diving below the LPAR level, even below the processor pool level. So we would want to, for example, examine the zIIP picture for an LPAR. But we wouldn’t want to just look at in in aggregate. We want to see individual processors. There are at least a couple of reasons:

  • Skew between engines could be important.
  • Behaviours, such as HiperDispatch parking, get thrown into sharp relief.

RMF

RMF (SMF 70-1) reports individual engines at two levels:

  • This z/OS image.
  • All the LPARs on this machine.

The trick is marrying these two perspectives together. Fortunately, a few years ago, I realised I could use the partition number of the reporting system and match it to the partition number of one of the LPARs. That does the trick.

In the past week I wrote some code to pump out engine level statistics for the reporting LPAR:

  • Vertical weights
  • Engine-level CPU utilization
  • Parked (or unmarked) time

The first two are from the PR/SM view. The third is from the z/OS view. Which makes sense.

In any case I have some pretty graphs. And I got to swear at Excel a lot.3

SMF 113 Hardware Counters

This one is more Anna’s province than mine. But, processing SMF 113-1 records at the individual engine level, we now can see Individual engine behaviours in the following areas:

  • We can see instructions executed, cycles used to execute them, and hence compute Cycles Per Instruction (CPI).

    At the individual engine level there is some very interesting structure, especially between Vertical Low processors (with zero vertical weight) and Vertical Highs (VHs) and Mediums (VMs).

    Actually there is a lot of difference sometimes between individual VH and VM engines.

  • We can see the impact of Level 1 Cache misses – in terms of penalty cycles per instruction – for Data Cache and Instruction Cache individually. This begins to explain the CPI behaviors we see.

    Pro Tip: Understanding the cache hierarchy in a processor really helps, and it’s different from generation to generation.

Those of you who know SMF 113 know there are many more counters. We intend to extend our code to look at those soon.

SMF 99-12 And -14

Another area we intend to extend our code to analyse is SMF 99 subtypes 12 and 14. This data will tell us how logical engines relate to physical engines, right down to which drawer they’re in, which cluster (or node for z13), even which chip. All of this can help with understanding the “why” of what SMF 113 is telling us.

Coupling Facility

You can play a similar RMF-level game for coupling facilities. Normally, you wouldn’t expect much skew between CF engines. But in Getting Nosy With Coupling Facility Engines I showed this wasn’t always the case.

I would say that, while the “don’t run your coupling facility CPU more than 50% busy” rule is sensible you might want to adjust it for any skew your coupling facilities are exhibiting.

Outro

We presented this material the other day to the zCMPA working group of GSE UK. This was to a small number of sophisticated customers, most of whom I’ve known for many years. It’s become a bit of a tradition to present an “alpha” version of the presentation.4

This post roughly follows the structure of the presentation. In this presentation we have some very pretty graphs. 🙂

Anna coined the term “research project”. I like it a lot.5 In any case, the code is a permanent part of our kitbag. If you send me data, expect me to ask for this new stuff and to use it in conversations with you. I think you’ll enjoy it.

We think the presentation went very well, with some nice discussion from the participants. Partly because of that, but not really, we intend to keep capturing hills with the code, gaining experience with customers, and evolving the presentation. Every so often I’ll highlight bits of it here. Stay tuned!


  1. I don’t count podcast show notes as “original content”, by the way. But rather a personal note on each episode. 

  2. You wouldn’t believe what various forms of autocorrect do to the string “Engine-ering”. Three examples are “Engine-Ewing” and, hilariously, “Engine-erring” and “Engine-earring”. 🙂 

  3. The only way, in my experience, not to swear at Excel a lot is to automate the things you find fiddly about it. I’ve done some of that, too. 

  4. Last year Anna and I presented an alpha version of “Two LPARs Good, Four LPARs Better?” To the same group. It was much better to actually have her in the room with us this time. 🙂 

  5. Much better than my “you’re all being experimented on”. 🙂 

Mainframe Performance Topics Podcast Episode 23 "The Preview That We Do"

(Originally posted 2019-02-27.)

This episode is hot on the heels of the previous one.

Marna set us the ambitious goal of getting it out on the day of the Preview Announcement of z/OS 2.4 – February 26th. And we succeeded. Phew!

I’m really excited about the Docker / Container Extensions (zCX) line item and I’m sure we’ll return to it – both as Mainframe and as Performance topics. Obviously, this being a Preview, that will have to wait a while.

So, I finally caved and mixed this Mono. I had no idea how I was going to do that. I hope y’all think it turned out OK.

I’m aiming to return to regular blogging soon. Right now there are things that I want to talk about but now is not quite the right time.

In the meantime, I hope you enjoy the show, and here are the notes.

Episode 23 “The Preview That We Do”

Here are the show notes for Episode 23 “The Preview That We Do”. The show is called this because we talk about the newly previewed z/OS release, V2.4, in the Mainframe section. This is our 24th episode too! How convenient! This episode is somewhat shorter than others because we wanted to slot it in for a particular date (the z/OS V2.4 Preview date) and we’d just done Episode 22.

Mainframe: z/OS V2.4 Preview

  1. “z/OS Container Extensions” aka zCX

    • Intended to enable users to deploy and execute Linux on IBM Z Docker containers on z/OS. Not just run but also to enable application developers to develop and package popular open source containers.
    • It is clear that Docker is becoming prevalent with users. Now, z/OS could leverage industry standard skills, quickly on z/OS.
    • One could pull IBM Z Linux containers from Dockerhub. Latest count was 1724 in 14 categories.
    • Martin is interested in the instrumentation, and in the SMF records. Configuration we’ll cover in a future podcast.
    • The planned preqs for zCX are: z14 GA2 or higher, and will require a HW feature code
    • zCX is planned to be zIIP eligible.
  2. z/OS Upgrade Workflow, no book

    • ”Upgrade” is the new term instead of ”Migration”
    • No z/OS Migration book, use the workflow instead. That requires you to become familiar with z/OSMF and workflows in particular.
    • Not everybody is familiar with z/OSMF, so we’ll export a workflow file and put it on Knowledge Center so you can view, search, print. However, the Workflow should give you a better experience.
  3. More in Pervasive Encryption

    • Additional z/OS data set types: PDSE and JES2 encryption of JES-managed data sets on SPOOL.
    • Without application changes, of course, and simplifies the task of compliance
  4. zfs enhancements

    • Better app availability

      • Allows app running in a sysplex and sharing rw mounted file system to no longer be affected by an unplanned outage.
      • Should no longer see an I/O error in this situation, which might have caused an application restart.
      • New mount option, and can be specifically individually or globally, and changed dynamically. New option will be ignored if specified and in a single system environment.
    • BPXWMIGF

      • Facility BPXWMIGF enhancements planned to migrate data from one zfs to another zfs, without an unmount.
      • Previously, facility was only for hfs to zfs.
      • New function helps with moving from one volume to another volume.
  5. MCS logon passphrases

    • Through the security policy profile specification, provides more consistent, secure system environment to meet security requirements.

Biggest question one may have: what level of HW will z/OS V2.4 IPL on? z/OS V2.4 will run on zEC12/BC12 and higher.

Performance: Coupling Facility Structure Duplexing

  • Two types of CF structure duplexing:

    1. User-Managed: Only DB2 Group Buffer Pools (GBP)
    2. System-Managed: e.g DB2 IRLM LOCK1 Structure
    • The structure types for system-managed duplexing are all types: list, list serialized, lock, and cache.
    • User-Managed obviously only Cache.
    • Some structures are not duplexed, e.g. XCF.
  • Structure performance matters

    • User-Managed not an issue.
    • System-Managed matters.
  • Asynchonous CF Structure Duplexing Announced October 2016

    • Just for lock structures, specifically DB2 IRLM LOCK1. This changes the rules, and requires co-operation from e.g. DB2.
    • Functional dependencies:

      • z13™, IBM® z13s with CFLEVEL 21 with service level 02.16 or later
      • z/OS® V2.2 with PTFs for APARs OA47796 (XES) and OA49148 (RMF)
      • DB2® V12 with PTFs for APAR PI66689
      • IRLM 2.3 with PTFs for APAR PI68378
    • Important considerations if Async CF Duplexing good all the time:

      • People make architectural decisions and this should not be a leap in the dark .
      • Ideally should be established with a little testing, with testing as close to production behaviors as possible.
      • Generally it’s good for you.
    • Configuration: Format couple data set, put into service, and then REALLOCATE. Again speaks to planning and testing.

  • The main event for this item is SMF.

    • SMF 74-4 Coupling Facility Activity data, primarily interested in structure-level, especially for structure duplexing of any kind. Though CF to CF pathing information also available.
    • Information at the structure-level

      • Size and space utilization, request rate and performance for both copies in the duplexing case, and bit settings for Primary and Secondary.
      • Still use old method of comparing traffic: Rates and Sync vs. Async. It doesn’t much matter for System-Managed.
    • New Async CF Duplexing instrumentation

      • APAR OA49148
      • Asynchronous CF Duplexing Summary section. Martin has a prototype in REXX to format it that gives timings of components. It is not the same as “effective request time”. Nor are raw signal service times.
      • “Effective request time” relates to effect on application, in the SMF 101 DB2 Accounting Trace.
      • Gives sequence numbers which are important for synchronization. If the sequence numbers are too far apart might indicate a problem.
  • Early days of Async CF Duplexing despite having been announced in 2016. Martin has been using a customer’s test data, and would like to build experience. Only a portion of this new SMF 74-4 data is surfaced in RMF Postprocessor reports.

  • z/OSMF Sysplex Management can help visualize and control the Sysplex resources. This function to help with control is in PI99307: SYSPLEX MANAGEMENT APPLICATION ENHANCEMENTS TO MODIFY SYSPLEX RESOURCES.

Topics: Smart home thermostats

  • Marna just installed two Nest thermostats, one in each zone (of a three-zone house). Is sharing data with Nest, and presumably whoever owns Nest currently (Google).
  • Marna’s house is oil heating, and AC with electrity. She installed them because of her electric company incentives.
  • The electric company can control the thermostat in the summer (air-conditioning) a certain number of days, for one hour, up to five 5F degrees. Since it is winter, she hasn’t seen this happen yet of course.
  • Instrumentation benefit is having an app in which she can look at what is happening at home, when away, and control it too.
  • There are excellent graphs on what has been used (hours of heating, cooling) in the app.
  • Also, there is geofencing via your phone, where the thermostat knows you are at home (or coming home) and can set the temperature what is desired. Marna has that location turned for two phones. Nest actually has been learning the habits of what she likes for temperature and can predict what to set.
  • Marna’s electricity usage hasn’t been able to shown to be reduced yet, but then again, it is not yet summer.
  • The app also compares her usages to the neighbors (whoever they might be). House size and people at home affect usage, so it’s unclear how that plays into these usage reports.

    • It is fun to gamify with neighbors!
  • Martin doesn’t have a smart home termostat, but does have a remote oil tank sensor to determine how much oil is left. This sensor feeds back into a device in the house, and connects to an app on his phone.

    • It costs 5 GBP a month, but is unsure yet if it is worth it.

Places we expect to be speaking at

  • Marna will be at SHARE Phoenix March 11-15
  • Martin will be March 12 GSE UK zCMPA Working Group – in London, with a new alpha presentation!

Contacting Us

You can reach Marna on Twitter as mwalle and by email.

You can reach Martin on Twitter as martinpacker and by email.

Or you can leave a comment below. So it goes…

Mainframe Performance Topics Podcast Episode 22 "Great App-spectations"

(Originally posted 2019-02-20.)

We just posted Episode 22 of Mainframe, Performance, Topics.

It features two of the longest topics we’ve ever recorded – and we think both of these topics warrant the time. If you want to fit this into your daily commute feel free to drive round the block a few times. 🙂

The Topics topic was particularly interesting in its gestation: It started out as a narrowish blog post of mine and then expanded into something much more general. While the iOS-specific bits might not be of especial interest to some of you, two things:

  • If you’ve not looked critically at how your apps behave and what they are capable of you might find it enlightening.

  • The z/OS-specific part of it should be of interest to all mainframers.

And we aimed that topic at both users and developers.

I would also highlight our new “Ask MPT” spot. We really do encourage questions. To quote David Brin “certainty is the lot of those who do not ask questions”. He makes certainty sound bad, doesn’t he?

And we had fun making this episode, so I guess we’ll do a few more… 🙂

Episode 22 “Great App-spectations”

Here are the show notes for Episode 22 “Great App-spectations”. The show is called this because we talk about app expections in our Topics topic.

We’ll use British spellings on these show notes, to be an equal opportunity documentation provider.

Where We’ve Been Lately

Marna has been to Istanbul, Turkey for the Tech U, February 6-8, 2019. Martin was nearly nowhere.

What’s New

  • z/OS V2.3 Enhancements RFA.
    • z/OSMF Workflow is enhanced with the PTF for APAR PH03053 to support the array type of variable, which could contain a set of values. Good things will come from this.

"Ask MPT” New spot!

  • Every podcast seems to have one: So we’ve decided to do a “Things people asked us” spot. Please submit questions!
    • Q: How can you tell who used Dynamic Linklist (with LNKAUTH=LNKLST) to implicitly APF authorise a data set?
    • A: In IEASYSxx (or on sysparm) LNKAUTH is specified or accept this as default. So when changing the linklist (and using this setting), you can see how APF authorisation is changing.
      • Looking up SMF records, we see that SMF type 90 subtype 29 for Linklist change (SET). Just notice also that subtype 31 for LPA (SET or CSVDYLPA), 37 for APF (SET or CSVAPF).

Mainframe: PI99365 Two enhancements in z/OSMF Operator Consoles

  1. Support for “sticking” WTOR and held messages on the top of the console area
  2. Visible EMCS console name
  • View WTOR and HOLD messages in a separate window

    • Tiny icon of a little display monitor next to the “bars” of messages, in the upper left to toggle this. Now there are two icons there. So it’s a separately scrollable area within the console messages area with the most important stuff
    • Can delete a HOLD messages manually from that window
      • To manually delete a message in this section, just click on the message and it gets put into a box with an “X” next to it. Just click on the X.
      • Also, z/OSMF automatically cleans up the messages. Real time messages are stored in z/OSMF (both on UI side and back end), and when messages exceed 10,000, then the oldest 5,000 are cleaned up.
    • On a busy system, this window is a little small and it’s sometimes hard to navigate. Removing messages helps with the clutter.
      • Hint: minimize the “bars” so you can see more in the WTOR and HOLD message window.
    • This line item is about making important console messages more recognisable
  • Visible console name part

    • Really handy places: on the tab for the console, and on Overview
    • Nicely helps with debug to see if your Operparm was set up correctly for the EMCS you are using

    • Overall: These two function areas help you manage your z/OSMF operator consoles better.

Performance: Paging Subsystem Design in an age of Virtual Flash

  • Question from customer about need for paging space if Flash installed , which was answered in Martin’s blog post, but there is more thinking about this.

  • Look at the paging subsystem design in the round, with two flavours of Flash:

    1. Flash Express (in zEC12, z13) which is PCI-E cards
    2. Virtual Flash Memory (z14) carved from memory
      • LPAR memory, but not the from that which a user defines for that LPAR
  • Design standpoint ideally as if no Flash

    • Think about the economics vs risk of losing Flash. The reality is loss of Flash might cause ABENDs that matter. Damage assessment is worth thinking through.
    • Flash is great – in the z/OS context – for handling dump capture, and spikes in memory demand in general.
  • Paging subsystem design: Two main considerations:

    1. Space: Ideally contain everything, particularly for dumping important address spaces
    2. Performance
  • Come together in “30% Contiguous Slot Allocation Algorithm breakdown” rule of thumb

    • Place local page data sets on separate volumes, even though virtualised.
    • Fast disk, ideally SSD (Flash)
    • 30% is not a hard and fast number, but we do see deterioation around the 30% mark.
  • Instrumentation

  • Wrap up: Paging subsystem design still worthy of care, and establish whether risk of Flash or Virtual Flash warrants conservative configuration of paging subsystem.

Topics: Anatomy Of A Great App

  • “App” here means “third party software” but we’ll say app for short, because of the title of the episode. We are talking to app developers here.

  • iOS perspective:

    • Highly biased on expectations in iOS, as Martin is a power user.

      • Automation is important
      • Fitting into Apple ecosystem –
      • Good quality apps – and is willing and able to pay for them.
    • Good

      • iCloud syncing – so data can be shared between devices.
      • URL support that is deep enough to reach specific bits of the application – so sophisticated automation can be built.
      • iPad Split Screen / Slideover support – to make it pleasant to use alongside other apps.
      • Siri Shortcuts support that is meaningful – again for automation, but also for voice control.
    • Better

      • Files access, for getting to app’s data from multiple apps.
      • Dropbox access – which speaks for itself.
      • x-callback-url support – for calls from one app to another. (Really sophisticated automation has been built this way.)
      • Programmatic automation support – whether Javascript or Python.
      • Well-chosen Siri Shortcuts support – as opposed to basic.
      • Cross-platform syncing, for start on an iPhone and finish on a Mac.
    • Best

      • Box access – less prevalent a need than DropBox.
      • TextExpander support – which can save a lot of typing and ensure consistency.
      • Workflow constructors via e.g. Drag and Drop ‘’
  • Android perspective

    • Marna is a low end user, and provides a different view. Doesn’t buy many apps, and doesn’t mind ads.
    • Bad defaults are user hostile. Example is “geography default” is not where I am right now.
    • Good provenance. Play Store is it.
    • Signed apps that must pass multiple security app scanning.
    • Sensible connections for cloud services (Google cloud and Google calendar)
  • z/OS perspective

    • SMP/E installable might have been in the past, but z/OSMF-installable is the future standard.
    • Uses documented interfaces.
    • Instrumentation. Has appropriate SMF records.
    • Security considerations. Critical.
    • Sysplex enabled, when appropriate.
  • Common stuff

    • “Day One” support for hardware and software. Be hooked into your foundation.

      • For z/OS TDMs / IBM Partnerworld
      • For iOS WWDC
    • Good support

      • Bug reporting being fit for purpose
        • On iOS you have to have a Developer account, and it’s hard to get one.
        • On z/OS we take bug reports from licensed customers
    • Decent documentation and samples.

    • Responsive developer social media presence at the usual sites. Social testing of apps is considerate.

    • Automatable

    • Not a resource pig, with not umpteen copies of frameworks. Martin and Marna’s Facebook app was rather large, but maybe that is ok if that is a critical app. z/OS has had a problem with proliferation of WebSphere Liberty profiles.

  • Conclusion: Think about more than just what your app is supposed to do. Nobody wants software whose function they like but they hate using. It is way too easy to uninstall an app (or have hundreds of them and not use them). Keep to the “Principle of least astonishment”.

Customer requirements

  • RFE 111923 Uncommitted Candidate

    • Currently Workflow default job card and the REST Jobs API requests can only be changed by individual users. We need the ability to change the current default jobcard for all users. For example, currently the default MSGCLASS is 0 we would like to change the default to X. Other installations might need to include accounting information.
    • Expecting individual userids to set their own installation default makes no sense, they should only have to change it if they won’t something different from the “normal” jobcard.

    • Seem reasonable, and might be an indication of the maturing of Workflows when you see requirements like these.

Places we expect to be speaking at

  • Marna will be at SHARE Phoenix March 11-15
  • Martin will be March 12 GSE UK zCMPA Working Group – in London, with a new alpha presentation!

On the blog

Final signoff to a legend, John Dayka

  • We have lost a dear friend, a trusted colleague, an incredible mind and an inspiring leader. John’s innovative contributions to Security over his prestigious career at IBM, his kindness to all peers, as well as his calm, level headed approach to all challenges will forever cement his legacy.

Contacting Us

You can reach Marna on Twitter as mwalle and by email.

You can reach Martin on Twitter as martinpacker and by email.

Or you can leave a comment below. So it goes…

Return Of Paging Subsystem Design

(Originally posted 2019-01-14.)

In Paging Subsystem Design, in 2005, I opened with the words

"Periodically on IBM-MAIN I’m caused to revisit something. With the advent of z990/z890 and “supposedly abundant” 🙂 real memory it seems to be time to revisit paging subsystem design.”

Note the smiley is in the original. Seems funnier now.

The z990/z890 era seems forever ago, now. And some things have changed. So an excuse to write about paging subsystem design would be welcome.

Well, I got asked a question about it by a customer. The more I thought about it the less sense it made just to reply privately. Instead, here’s a follow-on blog post.

So let’s assume z13/z141 era, there being slight differences between the two.

With z13, it is possible to acquire IBM Flash Express cards (actually available with zEC12 as well). You can page to Flash, and it’s highly performant.

With z14, Virtual Flash Memory (VFM), which is a use of real memory outside of an LPAR’s normal addressing range.

(The analogy with Expanded Storage is quite good. Indeed later implementations of Expanded Storage were in real memory. But that’s not something I want to dwell on here.)

In the rest of this, I’m going to use the term “Flash” to cover both the z13 and z14 implementations.

The Operative Question

If I have Flash, do I need as much of a paging subsystem as if I didn’t? That is the question. (What isn’t the question is whether I could have less real storage if I have Flash to page into.)

Of course “as much of a paging subsystem” refers to both space and I/O bandwidth considerations.

Assuming sufficient Flash, this becomes an availability question. I make this assumption because if it’s true we’re not going to see significant paging to the page data sets on disk. It’s actually a pretty fair assumption, for all but the very biggest workloads. For example, you can have up to 6 Terabytes of VFM (in 1.5TB increments).

But suppose something happened to Flash. You would, ideally, want the paging subsystem to handle the load. Whether the LPARs using it survived is another matter, of course; For instance, would major address spaces ABEND?

So this is a matter of attitude to risk. If you think the event is ludicrously unlikely, or that the event that caused the loss of Flash would also bring down the LPARs using it, maybe you don’t need much paging disk capability to back it up.

If, however, you absolutely must have a backup in the case of a Flash failure you would ideally configure the paging subsystem as if Flash weren’t there. Which would mean following the practices alluded to in Paging Subsystem Design.

But that’s the ideal. In practice, people might well configure somewhere between the two extremes. Now to talk to the customer about where they are on the spectrum. And to see whether I missed the point or not. 🙂


  1. I’m using “z13” and “z14” to denote generations. Of course, z13s and z14 ZR1 behave the same way as their larger cousins.