z17 Sustainability Metrics – Part 0

I call this “Part 0” because I haven’t yet seen any data. However I think it useful to start here, rather than waiting for customer data to appear – for two reasons:

  1. Sharing what data is available is useful.
  2. Showing how “folklore” is built is instructive and useful.

As you might expect, I’ve been down this way many times before.

What Sustainability Metrics Are

With z17 some new instrumentation became available. It appears in SMF 70 Type 1 records – in the CPU Control Section. Here are the fields:

Offsets Name Len Format Description
432 1B0 SMF70_CPUPower 8 binary Accumulated microwatts readings taken for all CPUs of the LPAR during the interval. Divide by SMF70_PowerReadCount to retrieve the average power measurement of the interval.
440 1B8 SMF70_StoragePower 8 binary Accumulated microwatts readings taken for storage of the LPAR during the interval. Divide by SMF70_PowerReadCount to retrieve the average power measurement of the interval.
448 1C0 SMF70_IOPower 8 binary Accumulated microwatts readings for I/O of the LPAR during the interval. Divide by SMF70_PowerReadCount to retrieve the average power measurement of the interval.
456 1C8 SMF70_CPCTotalPower 8 binary Accumulated microwatts readings for all electrical and mechanical components in the CPC. Divide by SMF70_PowerReadCount to retrieve the average power measurement of the interval.
464 1D0 SMF70_CPCUnassResPower 8 binary Accumulated microwatts readings for all types of resources in the standby or reserved state. Divide by SMF70_PowerReadCount to retrieve the average power measurement of the interval.
472 1D8 SMF70_CPCInfraPower 8 binary Accumulated microwatts readings for all subsystems in the CPC which do not provide CPU, storage, or I/O resources to logical partitions. These include service elementse, cooling systemse, power distributione, and network switchese, among others. Divide by SMF70_PowerReadCount to retrieve the average power measurement of the interval.
480 1E0 SMF70_PowerReadCount 2 binary Number of power readings for the LPAR during the interval.
482 1E2 6 Reserved
488 1E8 SMF70_PowerPartitionName 8 EBCDIC The name of the LPAR to which the LPAR-specific power fields apply.

I won’t list them all, but will rather synopsise them:

  • There are fields for this LPAR (the one cutting the 70-1 record). There are also fields for the whole machine.
  • There are fields for CPU, memory, and other things.

I hope this synopsis is easier to consume than the table.

Some Thoughts

The term “Sustainability Metrics” is not mine; It’s part of the whole Sustainability effort for z17, which is genuinely a big leap forwards. In reality the metrics are about power consumption.

In the Marketing material it is suggested you can use this support to break down power consumption to the workload element level. You might do this for such purposes as billing, or to support tuning efforts. There are no power metrics in SMF 72-3, 30, 101, 110 etc. The name of the game is prorating, or ascribing power consumption in proportion to usage.

Prorating

Prorating for CPU is relatively straightforward; All those records have CPU numbers. There is, of course the question of capture ratio. This applies from 70-1 to 72-3 and on down.

Memory is more tricky:

  • from SMF 71 down to 72-3 mostly works, though swappable workloads tend to be under represented in 72-3. This mainly relates to batch, but also TSO. Also, which workloads should things like CSA be ascribed to?
  • SMF 30 real memory numbers are generally unreliable.
  • Often the main user of memory is Db2. How do you apportion this memory usage?

The above is not new. The answer is not to be too pristine about it. However questions of billing and sustainability are ones where the pressure to be accurate or fair is keenly felt.

A Complete Picture?

As I mentioned just now, the data is for this LPAR, and the whole machine. There is no mention of other LPARs. This is clear from the fields being in the CPU Control Section, rather than in the Logical Partition Data Section. This choice relates to the source of the numbers: It comes from a Diagnose instruction that reports to the LPAR in this way.

So how to proceed in getting a LPAR-level view?

  • For z/OS LPARs collect SMF 70-1 from all of them, even the small ones.
  • Subtract the total of these from the machine-level number – for each metric.
  • What’s left has to be viewed as one item: Other LPARs.
  • Some other operating systems, notably z/VM, provide their own metrics. Some, notably Coupling Facility, don’t.

Generally I expect that to get you a good breakdown for z/OS and probably an aggregation for the rest, with power that can’t be attributed to any LPAR on top. That’s going to be a nice pie chart or perhaps stacked bar graph.

The Big Unknown Is How It Behaves

I wrote “a nice pie chart or perhaps stacked bar graph”.

This is real live instrumentation, rather than static estimates. One clue is in the inclusion of a sample count field. (The comments for many of the other fields suggest dividing by it.)

As such I expect power consumption to be shown to vary with time, and not just on configuration changes. I would hazard variation would be greater for CPU than eg memory, but I could be wrong. Hence my hoping for meaningful stacked bar graphs. And a summary for a shift could well be done as a pie chart. I will have to experiment with those – first in a spreadsheet and second in our production code.

Conclusion

It’s very nice to have time-stamped power consumption metrics. But what do I know? I haven’t seen data yet. When I do I’ll be sure to share the continuation of this journey. In the meantime what I’ve written above is the things that are obvious to me already.

This is a classic example of “we never had this data before”. I would expect it to be carried through to future machines. If so we can all see the difference when you upgrade beyond z17. I can see that being fun.

And this data is a salutary reminder of the importance of collecting RMF SMF data from all activated z/OS LPARs, especially 70-1.

Making Of

I’m writing this on a flight to Istanbul to see a customer. Nothing remarkable in that.

What is new is the return to using an iPad Mini. Long ago I had the first one released. It’s a nice size for a tray table in Economy I got a new one, along with keyboard case.

What’s new about this one is it supports a Pencil Pro. The keyboard case has a nice recess to keep the pencil safe. (Normally I world just stick the pencil on top of the iPad Mini – but this is better for travel.

I tried writing with the pencil but:

  • My handwriting isn’t very tidy.
  • Turbulence makes that worse. (I have direct experience of this.)
  • Lots of the terms I’ve used are technical, such as”LPAR” and “SMF”.
  • So it’s been a mixture of typing and writing with the pencil.

So it’s been a challenge – for the kit and for me. The palliatives are twofold:

  • Me to write more tidily, probably more slowly. Good luck with that one. 😀
  • Me to write some automation to fix some of the glitches. That will be fun as I’m writing using Drafts which has great JavaScript-based automation capabilities.

Mainframe Performance Topics Podcast Episode 36 “Telum Like It Is”

We were especially keen to get a podcast episode out on the day z17 was announced, having both worked on the launch project in various capacities. So, for once there was a deadline – which we actually made.

It was a pleasure to record this, with almost every piece of it related to z17.

So here are the long show notes.

Episode 36 “Telum Like It Is”

Where we’ve been

  • Marna has been to SHARE in Washington, DC, February 24-27, and in March 24-28 to a select customer council in Germany to talk about future designs.
  • Martin has been to Berlin (former East Berlin) for new hardware education.

Manual Of The Moment

  • Actually, it’s a section of a Manual of the Moment: Software requirements for running z/OS 3.1.
  • This is because this manual was refreshed when we moved our Java dependency on z/OS 3.1 from Semeru 17 to Semeru 21. Semeru 11 is EOS in November 2025.

Ask MPT

  • Keith Costley of Mastercard has an excellent question about Coupling Facility and utilitization.
  • Summary: My contingency SYSPLEX with shared ICF’s continually shows a utilization of 70-85% although there is not much work on those SYSPLEX’s. I don’t know if the is reporting correct? Is there a way to determine what is using up the ICF capacity if it is?  Also, we have Thin interrupts turned on.
  • Since this is a performance question, Martin answers: “Shared ICFs” is important here.
    • CF Activity Report uses R744PBSY / (R744PBSY + R744PWAI). Works well for Dedicated but not Shared. Particularly Thin Interrupts.
    • Use SMF 70-1 LPAR numbers instead for headline CF util for Shared but not Dedicated.
    • R744SETM records in-CF CPU for the structures. A 100% capture ratio does slightly weird things
    • Check out a blog post on this topic

Mainframe – How to prepare for IBM z17

  • z/OS support will be on V2.4 and higher. Reminder 2.4 supported with an extended service contract.
  • Full exploitation requires z/OS 3.1 + fixes.
  • As usual, there are three important SMP/E FIXCATs:
    • Required: IBM.Device.Server.z17-9175.RequiredService
    • Recommended: IBM.Device.Server.z17-9175.RecommendedService
    • Exploitation: IBM.Device.Server.z17-9175.Exploitation
  • Sysplex hardware co-existence requirements: z15, z16, z17
  • Somple Exploitation highlights:
    • BCPii & HMC Hardened Security: BCPii enhancements to support server-based auth with JSON web token. Allows not previously available operations including asynchronous notifications
    • Workload-Level Sustainability & Power Consumption: Provided by fixes on z/OS 3.1 and higher
    • Workload Classification Pricing: Can collect data which allows you to classify workloads to allow for a price differentiation
    • CE-LR CL6 & ICA-SR 2.0: Note that CL6 can only connect to CL6, and CL6 is for IBM z17. z17 CL5 can also connect to CL5 on older machines. ICA-SR 2.0 can connect to older ICA-SR adapters.
    • CFLevel 26 : Need to run CFSizer. Strongly recommend to use the z/OSMF CFSizer. There will be a PTF to add CFLEVEL 26 support to z/OSMF CFRM Policy Editor.
    • System Recovery Boost: No new recovery boost types at this point.
    • Data Processing Unit is the Integrated I/O Architecture.
    • Network Express feature: New converged I/O card. Enhanced QDIO as well as RoCE.
    • Crypto: Clear Key acceleration using a new CPACF instruction. No new cryptographic algorithms.
  • z/OS handy support matrix for IBM z17:

Mainframe Also – z/OS 3.2

  • z/OS 3.2 Preview timed to coincide with th April 8 IBM z17 announcement. Planned to GA September 2025, as usual.
  • z/OS 3.2 can IPL on IBM z15 and higher.
    • By the way, z15 is the first to do System Recovery Boost, so that means all z/OS 3.2 could use System Recovery Boost.
  • Some of the previewed functions include:
    • Support for the z17 hardware-accelerated AI capabilities.
    • Python EzNoSQL APIs, extending EzNoSQL for this modern language.
    • Communications Server is planned to use AI to provide intelligent network packet batching.
    • PARMLIB syntax validator for selected members through a REST API, for example ALLOCxx. This is for syntax not semantics. You can validate multiple members possible in one run. Returns JSON of parsings, as well as OK / Not OK. Syntax error flagged with valid values.
    • Automate software update installations via a new set of REST APIs added to the z/OSMF Software Update application. Also can be used by Ansible.
    • z/OSMF has user interface for DFSMS Storage Management, including REST APIs.
    • DFSMS support for direct encryption to tape.
    • RACF certificate support for multiple altnames
  • Of course, there will be more to come. Watch for a General Availability announcement.

Performance – IBM z17

  • Discussion about chips:
    • To understand Telum 2 you need to understand Telum.
    • Telum basics: 8 PU cores, DCM, Shared Caches.
    • Telum 2 as an evolution…
      • 7nm to 5nm benefits most notably with real estate for DPU and bigger cache.
      • Clock speed increase from 5.2 to 5.5GHz: energy consumption reduction.
      • Data Processing Unit (DPU). 4-port cards vs 2-port offers space reduction, and energy consumption reduction. DPU is accessible from other chips and DCMs.
      • IBM z Integrated Accelerator for AI (AIU): enhancements include INT8 and FP16. 8-bit integer, for example, datatype gets more throughput.Sharing within the drawer. Both these mean potentially more inferencing while meeting SLA’s.
  • Non-chip discussion:
    • Spyre: More aimed at Generative AI than AIU is. We’ll cover this topic more in another episode.
    • 4 drawer models are Max183 and Max208. 1-drawer, 2, 3. 43 vs 39, 47 vs 43 CPs in a drawer. Max183 closer to Max208 than Max168 was to Max200.
    • Maximum memory increased. 40TB -> 64TB. Per drawer might be more interesting.
    • Instrumentation primer:
      • Sustainability metrics in SMF 70-1, and other places.
      • DPU in SMF 73 in general and for Channel Measurement Group 4 and 5.
      • More on these topics at another time.
  • Like every generation, there’s some nice new functions and handy capacity increases.

Topics – Preparing for z17

  • Marna prepared for z17 by working on the z/OS z17 Upgrade Workflow and preparing customer presentations from that Workflow.
  • Martin participated in Redbook writing.
    • The team was composed of folks from around the world, including a couple of customers.
    • This was Martin’s first processor Redbook, but of course not his first Redbook.
    • The Redbook was written with individual chapters owned by individuals or subteams.
    • The z17 Redbooks were based on z16 Redbooks, picking through what’s new painstakingly. Back in old days was written in Bookmaster, now Framemaker.
    • The goal was to describe what z17 is rather than compare to z16 in Technical Guide, however, there is some need to compare and explain things like Branch Prediction.
    • To some extent it was “Thriving On Chaos”, which is a callout to a famous book by Tom Peters, a management guru.
    • Hot Chips conference was an early solid data point, as well as discussions with z/OS Development in Poughkeepsie in January 2025.
    • Martin’s analysis code will need updating, and he’s looking forward to actual customer data and mappings, understanding how it behaves.
  • We both are looking forward to talking about z17 in the future.

Out and about

  • Martin is going to Istanbul twice in the next few weeks to visit customers,and also is doing the GS UK Virtual Conference 29 April – 1 May.
  • Marna isn’t going anywhere until SHARE in August 18th – 22nd, Cleveland Ohio.

On the blog

So It Goes

Mainframe Performance Topics Podcast Episode 35 “In Search Of EXCELence?”

As usual it’s taken us longer than we would like. The usual problem of finding planning and recording slots we can both make applies. But I think the episode turned out well. It was certainly fun to make.

So here are the show notes.

Episode 35 “In Search Of EXCELence?” long show notes

This episode title is about our Topics Topic.

Since our last episode, Marna was in Kansas City for SHARE, and in Germany for Zeit fur Z. Martin has been to South Africa for a customer.

Take note about the z/OS 3.1 functional dependency moving from Semeru 11 to Semeru 17 before Novemeber 2025.

Manual of the Moment: MVS Extended Addressability Guide

  • All about 64 bit, data spaces, hiperspaces, cross memory.
  • Well written, good introductions and then some.

Mainframe – Looking at ages of fixes you haven’t installed yet

  • This is a tieback to our Episode 34, when we talked about UUID.
    • The UUID for z/OSMF Software Management is the ability to know for certain, when used according to the rules, what SMP/E CSI represents your active system.
  • This episode’s topic is still in the vein of knowing insights on the service level of your system: How long has an IBM PTF been available?
  • Kind of related to Recommended Service Update (RSU), as RSU marking is a set of rules for how a PTF ages before it get recommended.
    • But this discussion will be specifically on being able to know about what date that IBM PTF was available for you to install
    • There are other vendors which make their PTF availability date easily discernable, but now IBM has done that too.
  • How to know when the IBM PTF was available:
    • IBM has started adding a REWORK date to PTFs. The format is yyyyddd, Julian date.
    • Take note, though, that actual REWORK date put on the PTF may be a day or two before the actual date it was made available, but usually that difference of a day or two isn’t important.
    • Marna looked at a considerable sample size and looked at the actual Closed Dates of PTFs, and the REWORK date, and most are one day different.
    • A use case where PTF Close Date can help:
      • Some enterprises have a policy that Security/Integrity (SECINT) PTFs that meet some criteria must be installed within 90 days of availability.
        • So that’s where the “availability” value comes in.
      • It certainly isn’t hard to know, if you’ve RECEIVEd PTFs, when they closed. Add the SECINT SOURCEID to know which PTFs are SECINT.
      • Useful reminder that the SECINT SOURCEID marking is only available to those that have access to the IBM Z and LinuxOne Security Portal.
    • Combine the two pieces of information to know how long a security integrity fix has been available.
      • That way you can see how well you’re doing against your 90 day policy.
      • Also it can give you your action list with a deadline.
    • Also a great time to remind folks that using an automated RECEIVE ORDER for PTFs, gets you all the PTFs that are applicable to your system. And that means the REWORK date is available at your fingertips right away.
      • If you do not automate RECEIVE ORDER, then you are left with a rather long way to manually retrieve them, likely from Shopz.
    • How about viewing rework dates on PTFs that are already installed?
      • We now know the date a PTF was available, and you’ve always known the date and time the PTF was APPLYed.
      • So, you could gather a nice bit a data about how long fixes are available before they are installed in a lovely graph.
    • For another view of the data, customers roll fixes across their systems. And, you could even do comparisons between systems to see ages of fixes as they roll across your enterprise.
      • Don’t forgetto do those comparisons between systems as they are deployed, that UUID comes in very handy.
    • Another interesting side effects of knowing the date an IBM PTF was available:
      • The RSU designation. Now you can see how long it took that PTF to become recommended, if such a thing floats your boat.
      • Another example, is looking at New Function PTFs, which are likely have a HOLD for ENHancement.
        • You could do spiffy things like notice how long a New Function PTF has aged before becoming Recommended.
  • Where to get the REWORK date from:
    • As you would expect you can see the REWORK date within queries today (for instance when you do an SMP/E CSI query of the LIST command).
    • Although you might not see it in all the z/OSMF Software Management and Software Update locations just yet, we are aware that would be another piece where it should be surfaced.
  • The possibilities of knowing more insights just got a lot bigger, now we have this piece of data. Using it in conjunction with the UUID makes it even more powerful.
    • Customers can make better decisions and get more info on how they’re carrying them out.

Performance – Drawers, Of Course

  • This topic is a synopsis of Martin’s 2024 new presentation. This discussion is about z16 primarily.
  • Definition of a drawer:
    • A bunch of processing unit (PU) chips
    • Memory
    • Connectors
      • ICA-SR
      • To other drawers
      • To I/O drawers
    • Buy 1 to 4 drawers in z16 A01.
      • 4 drawer models Factory Build Only
    • Drawers and frames definition:
      • Frames are 19” racks
      • 0-3 PU drawers in a drawer
      • In the A and B frames
  • z16 cache Hierarchy
    • PU Chip definition:
      • 8 cores
      • Each has its own L1 cache
      • Each has its own L2 cache
        • Shared as virtual L3 cache
          • Fair proportion remains as the core’s L2
      • L2 and L3 same distance from the owner
      • Part of a Dual Chip Module (DCM)
  • DCM definition:
    • 2 PU chips
    • Coupled by M bus
    • Connected to other 3 DCMs in drawer by X Bus
    • Virtual Level 4 cacheacross the drawer
    • Drawers inter-connected by A Bus
      • Much further away
      • Remote L4 cache as well as memory
  • LPARs should fit into drawers
    • All of an LPAR’s logical processors and memory should be in the same drawer
    • Important because cross-drawer memory and cache accesses are expensive
      • Often shows up as bad Cycles per Instruction (CPI)
      • This is the reason why, though z/OS V2.5 and higher can support 16TB of memory, you really shouldn’t go above 10TB in a single LPAR.
  • Processor drawer growth…
    • … with each processor generation
      • Higher max core count per drawer
      • Each core faster
      • Max memory increased per drawer most generations
        • Most customers have rather less than the maximum memory per drawer
  • However, z/OS workloads are growing fast
    • Linux also
    • Also Coupling Facilities, especially with z16
  • So it’s a race against time
  • Drawers and LPAR placement
    • z/OS and ICF and IFL LPARs are separated by drawer
      • Until they aren’t
      • z/OS LPARs start in the first drawer and upwards
      • ICFs and IFLs in the last drawer downwards
    • Collisions are possible!
    • More drawers gives more choices for LPAR placement
      • And reduces the chances of LPAR types colliding
    • PR/SM makes the decision on where LPARs are.
      • For both CPU and memory
      • However a sensible LPAR design can help influence the PR/SM decision.
  • Drawers and resilience
    • Drawer failure is a rare event, but you have to design for it
    • More drawers gives more chance of surviving drawers coping with the workload
    • If planned for, LPARs can move to surviving drawers
  • Drawers and sustainability
    • Each drawer has an energy footprint
      • Larger than a core
    • Improved in each generation
    • Depends also on eg memory configuration
  • Drawers and frames
    • Frames might limit the number of processor drawers
      • Frames sometimes limited by floor space considerations
    • Depends also on I/O configuration
  • Instrumentation in SMF records
    • SMF 70
      • Reporting In Partition Data Report
      • CPU Data Section
        • HiperDispatch Parking
      • Logical Partition Data Section
        • Describes LPAR-level characteristics
      • Logical Processor Data Section
        • Core-level utilisation statistics
        • Polar weights and polarities
        • z16 adds core home addresses
        • Online time
      • Logical Core Data Section
        • Relates threads to cores
        • Only for the record-cutting z/OS system
        • Allows Parking analysis
    • SMF 113
      • Effects of LPAR design
        • Sourcing of data in the cache hierarchy
        • Points out remote accesses
          • Including cross-drawer
          • Cycles Per Instruction (CPI)
          • One of the acid tests of LPAR design
      • Record cut by z/OS
      • Granularity is logical processor in an interval
      • Can’t see inside other LPARs
    • SMF 99-14
      • Largely obsoleted by SMF 70-1
      • Uniquely has Affinity Nodes
      • Also has home addresses
        • Only for the record-cutting system
        • Obviously only for z/OS systems
        • So no ICF, IFL, Physical
      • Supports machines prior to z16
    • You can graph cross-drawer memory accesses using John Burg’s formulae
      • L4LP and L4RP for Level 4 Cache
      • Martin splits MEMP into MEMLP and MEMRP
        • But fairly it represents “Percentage of Level 1 Cache Misses”
      • Part of Martin’s standard SMF 113 analysis now
  • In short, there is lots to think about when it comes to drawer design and what you put in them.

Topics – Excel Love It Or Hate It

  • This section is inspired by Martin’s blog posts
    • Sandpapering off the rough corners when using Excel
  • What we have used Excel for:
    • Marna’s use:
      • SMP/E report for HIPERs and PEs – moved to Excel to take notes, tracking data base
      • Doing university expenses. Budget, incomes, expenses, and gaps.
    • Martin’s use:
      • Preparing for and graphing. He does the heavy lifting outside of Excel, creating CSV files for import.
      • Graphic automation. Export graph as a picture for the presentation. Use it as your graph creator. CSV input can be hard – dialog is cumbersome.
      • GSE submission as a tracking data base. Need in a “portable” format for sharing with others.
  • How we use it
    • Macros and formulae
      • Martin tries to avoid them by doing the calculations externally
        • Not really doing “what ifs”
        • Basic formulae
    • Default graphing scheme
      • Martin has to fiddle with any graph automatically generated:
        • Font sizes
        • Graphic sizes
        • Occasionally need a useful colour scheme
          • Eg zIIP series are green and GCPs are blue and offline have no fill colour
      • Marna hasn’t needed it to be consistent
        • Occasional graphing
        • Martin’s average customer engagement involves at least 20 handmade graphs
          • Some graphs need a definite shading and colour scheme
  • What we love about Excel
    • Turning data into graphs
    • Easy for my most basic uses
  • What we hate about it
    • Incredibly fiddly to do most things
    • Wizards don’t work well for me
    • Automation does sand the rough corners off
      • But it’s rather tough to do
        • Obscure syntax
        • Few examples
        • Martin tends to use AppleScript (not unexpected)
          • Hard to automatically inject eg VBA into Excel
    • Over aggressive treating cells as dates and times
  • Martin has several times had a bonding experience with customers where both are swearing at Excel.
Customer requirements
  • z/OSMF REST API for spawning UNIX shells, executing commands
    • To quote from the idea “Motivation: Given that we already have APIs for working with TSO Address Spaces, it seems reasonable that there be a set of APIs that offer much of the same functionality for UNIX address spaces via a shell interface. This would help bring z/OS UNIX on par with TSO, and make it more accessible, especially for modernization efforts.”
  • We think this is a nice idea.
  • You could automate from lots of places
  • Status: Future Consideration
Out and about
  • Marna and Martin will both be at the GSE Annual Conference in UK, November 4-7, 2024.
  • Martin will be in Stockholm for IBM Z Day.
  • Martin will have another customer workshop in South Africa.
On the blog
So It Goes

Utility Window Detection

It seems to me understanding when an installation’s utilities run is important.

Why is this?

It’s because utilities can make systems behave differently from how they usually do. And some of these behaviours can represent a challenge.

It’s often also valuable to reduce the utility window.

What Is A Utility?

There is no formal definition of a utility – and there are plenty of uses of utility programs that you might not think of as part of a utility window. For example, a database unload for further processing. But, then again, you might consider that part of maintenance. (Usually this is faster than an application program.)

Another example might be a VSAM REPRO out. Usually this is to back the dataset up – but not always.

(To my mind REORGs are unambiguously maintenance- but someone probably has a use case where it isn’t, particularly if you can unload as part of the REORG.)

Behavioural Differences

In a utility window a number of things commonly happen:

  • Coupling facilities get busier – driven by structures such as LOCK1 and Group Buffer Pools.
  • zIIPs get busier – as many utilities use zIIPs.
  • Memory usage goes up – generally driven by sorting.

The above, of course, is not an exhaustive list.

You might mitigate some of these by slowing down executions but it’s worthwhile considering that many installations limit or even exclude application processing. So there’s a trade off between application availability and resource consumption.

Obviously utility windows need careful planning – in terms of scheduling and phasing.

And discussions around all this are ones I’d like to have with customers. Which is why detection is important to me.

What I wouldn’t claim the right to do is to try to influence a customer’s utility strategy.

You’ll notice this post doesn’t talk about when utility executions happen in actual customers. That’s deliberate – as I want to get more experience of timing and utility window contents.

Detecting Utility Windows

In this post I am limiting myself to Db2 utilities and IDCAMS, though the “SMF 30” method is more general.

Using SMF 30

SMF 30 records address space activity. Subtypes 2 and 3 are interval records. Subtype 4 is written when steps end. All have program name.

I’m using interval records – as that makes it easier to summarise by the hour. (I could go down to whatever the SMF 30 interval is but hour granularity generally suffices.)

I can – and do – summarise by hour by day by system. I also roll up the days and sometimes groups of systems.

As SMF 30 has program name it’s easy to pick out records with eg “DSNUTILB” or “IDCAMS”.

If I use the Usage Data Section I could relate “DSNUTILB” to a specific Db2 subsystem. I don’t do that today but perhaps I might one day.

In any case I consider this method light weight- which is a good thing

Using Db2 Accounting Trace

This method only applies to online utilities – namely those that interact with a Db2 subsystem.

It’s also heavier weight than the SMF 30 method – as it requires the installation to have turned on Db2 Accounting Trace. While the records that relate to Db2 utilities aren’t all that expensive or numerous all the other records that would be cut in the same timeframe make this expensive.

In particular, records with a field QWHCATYP of value QWHCUTIL denote a utility execution. (The field and this value are defined in member DSNDQWHC of the Db2 macro library SDSNMACS.)

I’m not sure this adds value over the SMF 30 method but it is certainly valid.

One extension of this is to query the Db2 Catalog. Until recently I would’ve recommended querying SYSIBM.SYSCOPY but recently additional tables have appeared. (I used to query these – once JOBNAME was added for me – to help with utility execution tuning. I haven’t had the opportunity to do that for a long time.)

Making Of

I started writing this on a plane to Stockholm, to present “Drawers, Of Course” and “Parallel Sysplex Performance Topics” to a group of Swedish customers. I continued writing it on the way back.

I’m also nearing the end of writing code to detect utility windows from customer data. Writing this helps clarify my thoughts about what the code should do. Of course, the next few times I deploy the code in customer situations will help refine it. I’m looking forward to that.

This code, by the way, extends existing code that detects eg Catalog and DFSMShsm spikes in EXCPs and CPU. It seems a natural fit.

Right now it recognises a very limited number of utility programs. I intend to – with more experience – to add additional programs; The code is written in a way that facilitates that. (I’m thinking IMS and DSN1COPY, as well as third party utilities.)

But anyway, here’s what the output looks like right now:

It does show particular times of day have high activity, especially for DSNUTILB.

Engineering – Part 9 – Perhaps Affinity Nodes Are Interesting After All

I’ve tended to think Affinity Nodes aren’t worthy of study – because there’s little you can do to influence them.

However, that doesn’t mean one shouldn’t understand them.

(In this post I’m going to say “processor” when I mean “logical processor” and “affnode” when I mean “affinity node”. Also “VL” for “Vertical High”, “VM” for “Vertical Medium”, and “VL” for “Vertical Low”. It’s going to get wordy enough as it is.)

What Is An Affinity Node?

While a z/OS LPAR has logical processors in two pools – GCP and zIIP – there is further division into affinity nodes.

Each affnode generally contains a subset of the processors from a given pool. When HiperDispatch was announced that subset was said to be 4 processors.

Work, with the exception of SYSSTC, runs essentially in a single affnode. (SYSSTC work can run anywhere, being generally short running work. And, barring one of my war stories, not at all disruptive.)

So, you could imagine an affnode has queueing characteristics like a smaller LPAR. The payoff – perhaps set against this – is that work tends to be redispatched on the same small set of processors, leading to better cache effectiveness.

There are two mechanisms related to affnodes that are important to understand, at least conceptually:

  • In the short term an overloaded affnode can ask for help from other affnodes. This is the “needs help” mechanism you might’ve heard of.
  • In the longer term work can be redistributed among affnodes – for balancing purposes.

There is in fact a third mechanism:

  • The processors in an affnode can be moved to other affnodes. Indeed affnodes can be formed and dissolved.

How Do I Observe Affinity Nodes?

Prior to z16 SMF 99-14 records were the only SMF way to get processor home addresses. It was also the only SMF way to see affnodes.

In Engineering – Part 7 – RMF Processor Home Address Fields I described how SMF 70-1 support for z16 added home addresses for LPARs’ processors. What I didn’t say was that it added affnode support. That’s because it doesn’t.

So, the source of affnode information is SMF 99-14. For a long time my code that extracted processor home addresses for me also extracted affnodes.

Recently I started enhancing this code (unimaginatively called “XTR9914”).

And taking another look is what got me to this blog post.

It’s important to note that 99-14 doesn’t give me information on what runs in each affnode, nor how busy each processor is. For the latter you need SMF 70-1 or 113. The former isn’t catered for at all.

It’s also important to note that evidence for affnodes helping each other comes from SMF 72-3 at the service class period level. Here you’ll see zIIP-eligible work running on a GCP. What you won’t explicitly see is work not eligible for zIIP running on a helper node. Also, evidence for delay in either getting help or eventually running on the home affnode is in 72-3.

There probably is some other SMF 99 subtype that tells you more – and I’d be delighted if someone pointed me to any such information.

What I’ve Observed

I’ve observed a fair number of things – from SMF 99-14:

  • An affnode contains GCPs or zIIPs – but not both. One obvious corollary of this is that work could run in 2 affnodes – even without “needs help”. (I’ve not seen evidence of this.)
  • Affnodes can be of mixed polarities.
  • Affnodes tend to have fewer processors than 4. If the number is 4 the affnode tends to contain VM or even VL processors.
  • Affnodes in z16 are in a single chip – which makes sense from a Virtual Level 3 Cache perspective.
  • Affnodes do indeed come and go. But it is very infrequent.
  • Similarly, processors can move between affnodes – on a very infrequent basis.

The last two are interesting – and tend to occur when processors change affinity. I know this because I can see the polarities of individual processors and which affinity node they are in.

(Processors changing affinity tends to happen when another LPAR is activated or deactivated, or weights being altered.)

Helper nodes are interesting:

  • I can see that no zIIP affnode helps a GCP affnode. This is to be expected.
  • I can see GCP affnodes help GCP affnodes.
  • I can see each node tends to have multiple helper nodes. And – for a small enough LPAR – some have none.
  • I can see a definite preference list.
    • For an LPAR with enough zIIPs a zIIP affnode has another zIIP affnode first on its list, followed by the GCP nodes.
    • At least for GGP affnodes I can see helper affnode preferences rotating. (Well, sort of but close enough.)
    • I haven’t looked at an LPAR with 3 or more zIIP affnodes so I can only assume their preference lists rotate.

Observing these things brings the whole idea of affnodes to life for me.

Here is an example – before I’ve done anything fancy with the reporting:

In the above Affinity Node 4 is a zIIP node. You can see it has the other three as helper nodes. They are GCP nodes – and help each other.

99-24’s are cut on a 5 minute interval – or when the topology changes. So here’s another one:

  • While it’s possible affnodes change within the interval and then change back again the long term constancy of what 99-14 shows suggests to me this isn’t happening. In other words, affnodes don’t change much – assuming configuration changes or eg IRD Weight Management aren’t a disruptive factor.

Conclusion

While it’s true you can’t do much to influence the formation of affinity nodes, it might influence your LPAR design a little.

Their behaviour is not what I expected – and not quite what the literature would have you believe.

One final point: If you understand which logical processors are in which affinity nodes perhaps it will help you understand any imbalance in utilisation between an LPAR’s logical processors.

In any case, I’ve found this bit of observing and learning most interesting. It’s enough to make me want SMF 99 Subtype 14 records in each engagement, whether the outcome is some “tourist information” or an actually useful insight.

Making Of

Yet again I find myself on a daytime flight that was supposed to be a nighttime flight. Again to Johannesburg. So what does one do with 11 hours when one is unlikely to sleep? So that’s how this one got written.

And it got finished off at 5AM on the return flight.

Engineering – Part 8 – Remote Access Detection

The fact we’re up to “Part 8” illustrates there’s a lot to talk about once you take performance analysis down to the individual logical / physical processor level.

Many people have talked about the need to avoid LPARs crossing processor drawer boundaries. There are good reasons for this – because access to cache or memory in another drawer incurs many penalty cycles – where the core can’t do anything else.1

This post is about verification: Checking whether cross-drawer access is happening.

How Might Cross-Drawer Access Happen?

There are two main reasons why cross-drawer access might happen:

  1. If an LPAR has too many logical cores – say, more than a drawer’s worth – they will be split across two or more drawers.2 When I say this, I’m talking about home addresses, rather than guaranteed dispatch points.

  2. Vertical Low (VL) logical cores – those with zero vertical weight – can be dispatched on any physical core of the corresponding type, including in a different drawer. The trick is to avoid dispatching on VL’s as much as possible.

One mitigating factor is that most work in a z/OS system is restricted to run on a small subset of cores – its Home Affinity Node. These are generally in the same chip so sharing certain levels of cache.

How Might You Detect Cross-Drawer Access?

John Burg has for many years proposed a set of metrics. I will introduce the relevant ones in a minute.

Cache Hierarchy

To understand the metrics you need at least a working knowledge of the cache structure of the processor.

On recent machines it logically goes like this:

  • Every core has its own Level 1 cache
  • Every core has its own Level 2 cache
  • Cores share a Level 3 cache
  • Cores share a Level 4 cache

I’ve left the descriptions a little vague – deliberately: It can vary from generation to generation. Indeed z16’s implementation is very different from that of z14 and z15.

The deeper into the cache structure you go to obtain data the costlier it is in terms of cycles. And memory accesses are costlier even than Level 4 cache.

It is only at the Level 4 cache and memory levels that it’s possible to access remotely – that is data in another drawer.

SMF 113

Long ago – on z10 – SMF 113 was introduced. It documents, among many other things, data accesses by the z/OS system that cuts the record. It does not know about other LPARs.

SMF 113 contains a number of counter sets. Two of these counter sets are relevant here:

  • Basic Counter Set
  • Extended Counter Sets

Basic counters are present on all generations of machines since z10. Such counters follow the naming convention “Bx”. For example, B2.

Extended counters vary from generation to generation. Indeed they follow the cache hierarchy model of the machine generation. As previously mentioned, z14 and z15’s design differs from z16’s. So the extended counters for z14 and z15 differ from those of z16. Indeed counter numbers (“Ex) are sometimes reused. An example of an extended counter is E146.

Using The Counters

Let’s look at the relevant metrics John defines.

  • Level 1 Miss Percentage (“L1MP”) is percentage of accesses that aren’t resolved in the L1 cache.
  • Level 2 Percentage (“L2P”) is the percentage of L1 cache misses resolved in the L2 cache.
  • Level 3 Percentage (“L3P”) is the percentage of L1 cache misses resolved in the L3 cache.
  • Level 4 Local Percentage (“L4LP”) is the percentage of L1 cache misses resolved in the L4 cache in the same drawer.
  • Level 4 Remote Percentage (“L4RP”) is the percentage of L1 cache misses resolved in the L4 cache in a different drawer.
  • Memory Percentage (“MEMP”) is the percentage of L1 cache misses resolved in memory in the same or a different drawer.

These metrics are all composed of multiple counters. So I can split MEMP into two, with a slight renaming:

  • Memory Remote Percentage (“MEMLP”) is the percentage of L1 cache misses resolved in memory in the same drawer.
  • Memory Remote Percentage (“MEMRP”) is the percentage of L1 cache misses resolved in memory in a different drawer.

Obviously MEMP = MEMLP + MEMRP.

L4LP and MEMLP combine to form local accesses where there was the hierarchical possibility of remote accesses. Similarly L4LP and MEMRP combine to form remote accesses.

I don’t think I would necessarily compute a ratio of remote versus local but the following graphs are nice – particularly as they show when remote accesses are prevalent.

It should also be said that L2P, L3P, L4LP, L4RP, MEMP, MEMLP, and MEMRP are all percentages of something quite small – L1MP. Generally L1MP is only a few percent – but we care about it because L1 cache misses are expensive.

Here is a view of all the Level 1 Cache Misses. The total wobbles around 100% a bit. I’ve found this to be typical.

And here is zooming in on the only components where there is potential for cross drawer.

You’ll see from both of these there’s a “time of day” thing going on.

Conclusions

Some LPAR design commonsense is worthwhile:

  • Minimise the risk of LPARs having theire home addresses being split across drawers.
  • Minimise running on vertical lows. (Yes, some might be inevitable.)

The point of this post is to show you can measure cross-drawer access. It’s worth keeping an eye on it, especially if your LPAR design leaves you prone to it.

While the higher the cost the further you go into the cache hierarchy, it would be extreme to have a design criterion that LPARs fit into a single DCM, let alone the same chip; Many customers would find such a constraint difficult. Indeed cross-LPAR communication might well negate the benefits.

Finally, a processor in “the main drawer” could be accessing cache or memory in a remote drawer. Equally, it could be a processor in “the remote drawer”. Not that there is a real concept of “main drawer” or “remote drawer”; It’s a matter of statistics. So “local” and “remote” are relative terms. But they are useful to describe the dynamics.

Making Of

I started writing this on my first attempt to fly to Copenhagen today. The plane was broken but there was a lot of time stuck on the plane before we decamped. A couple of hours later I more or less finished it on the second plane – which did take off.

Then, of course, I fiddled with it… 😃

Who says travel is glamorous? 😃 But, as I like to say, today’s travel bother is tomorrow’s war story…

… And I am finishing this off in my hotel room in Copenhagen after my second trip – to GSE Nordic Region conference. That was a very nice conference.


  1. SMT-2 means that one thread can be usefully employed while the other is waiting for data. But it doesn’t really negate the issue. 

  2. With a sufficiently complex set of LPARs PR/SM might, I guess, have to split more modest LPARs across drawers. If this happens at all I would expect it to be rare. 

Engineering – Part 7 – RMF Processor Home Address Fields

I was recently asked where I was getting the home addresses for logical processors from in RMF SMF 70. I will agree the fields I use are not that easy to interpret – so here’s my go at making them more accessible. Along the way I’ll also talk about SMF 74 Subtype 4 Coupling Facility fields that cover the same ground. But therein lies a problem.

It’s important to note these new fields only became available with z16 and RMF’s support for it; z15 and prior won’t populate the fields. The RMF APARs are OA62064 and OA61041. If you’ve ever tried to read the SMF manual you’ll’ve had one of two reactions:

  1. I’ll take this field description at face value.
  2. What exactly does this field mean?

Both are valid reactions. Most people have Reaction 1. I, like most seasoned Performance people, have Reaction 2. I’d say it’s my job to have Reaction 2.

The names of fields and their descriptions are, of course, obscure. Which is why I advocate playing with data to see how it behaves – on top of attempting to make sense of the SMF manual. But that’s someone with Reaction 2 speaking. 😃

The z/OS 3.1 manual seems to have caught up with this instrumentation. 1

I’ve added links to the two relevant sections below.

SMF 70-1 Logical Processor Data Section

There are newer fields in the Logical Processor Data Section. Some decoding of offset 89 onwards is needed.

The Logical Processor Data Section is documented here. It’s a very nice section but most of its contents aren’t new.

So here is a subset of the section of relevance:

Offset (Decimal) Offsets (Hex) Name Length Format Description
89 59 SMF70MaxNL 1 binary Maximum number of topology nesting levels. The value is model dependent with a maximum of 6.

0 The model does not provide information about the topological nesting levels.

1 There is no actual topological nesting structure.

2 – 6 Topological nesting levels are available, beginning with field SMF70CordL1 up to field SMF70CordLx, where x is the value that defines the maximum number of topology nesting levels.
90 5A SMF70CordL1 1 binary Coordinate of the preferred dispatch location of the logical core at topological nesting level 1. Valid if SMF70MaxNL > 0
91 5B SMF70CordL2 1 binary Coordinate of the preferred dispatch location of the logical core at topological nesting level 2. Valid if SMF70MaxNL > 1
92 5C SMF70CordL3 1 binary Coordinate of the preferred dispatch location of the logical core at topological nesting level 3. Valid if SMF70MaxNL > 2
93 5D SMF70CordL4 1 binary Coordinate of the preferred dispatch location of the logical core at topological nesting level 4. Valid if SMF70MaxNL > 3
94 5E SMF70CordL5 1 binary Coordinate of the preferred dispatch location of the logical core at topological nesting level 5. Valid if SMF70MaxNL > 4
95 5F SMF70CordL6 1 binary Coordinate of the preferred dispatch location of the logical core at topological nesting level 6. Valid if SMF70MaxNL > 5

The way to handle this is to take the first SMF70MaxNL nesting levels and decode them. As the description indicates, the meaning of each level is model dependent. It’s worth noting that I don’t expect to see values of SMF70MaxNL below 2 as this support only came in with z16 – but it might be this RMF support gives it the value 0 or 1 for eg z15.

With z16 SMF70MaxNL is 4 – but I don’t think you should hardcode this as one day it might change. That’s not a heavy hint, by the way.

For z16 the coordinates work the following way:

Field Level Meaning
SMF70CordL1 1 Always 0 (unused)
SMF70CordL2 2 Chip
SMF70CordL3 3 Dual Chip Module (DCM)
SMF70CordL4 4 Drawer
SMF70CordL5 1 Always 0 (unused)
SMF70CordL6 1 Always 0 (unused)

You’ll note we don’t have physical core number.

So you get down to the individual chip in a Dual Chip Module – so which side of the DCM.

What’s really nice is this field is available for all logical processors for all LPARs on the machine. And I conjecture for LPAR “PHYSICAL” this is the address of each characterised physical processor. 2 So far the latter has been credible.

SMF 74-4 Processor Utilization Data Section

The Processor Utilisation Data Section is documented here .

The new field is 16 bytes long at offset 28:

Here is a subset of the section in SMF 74 Subtype 4 that deals with Coupling Facility CPU – at the logical processor level:

Offset (Decimal) Offsets (Hex) Name Length Format Description
0 0 R744PNUM 4 binary CPU number.
4 4 R744PBSY 4 binary Busy time (in microseconds).
8 8 R744PWAI 4 binary Wait time (in microseconds).
14 E R744PWGT 2 binary Shared processor weight. Valid if R744FLVL > 14.
28 1C R744PTLE 16 EBCDIC CPU-type topology list entry, as returned by the STSI instruction SYSIB 15.1.2 (Configuration Topology). See IBM z/Architecture Principles of Operation for the format. Valid if R744FLVL > 24.

RMF gets this new field from XES using mapping macro IXLYAMDA submapping IXLYAMDCFMINFO. I was aware of IXLYAMDA for many years, but not this sub mapping.

The documentation for IXLYAMDCFMINFO describes the field as “CPU-type topology list entry, as returned by the STSI instruction SYSIB 15.1.2. Refer to Principles of Operation for format. (LEVEL25)”

“LEVEL25” refers to Coupling Facility Control Code (CFCC) – which is indeed new with z16.

I’ve tried to examine this mysterious “SYSIB 15.1.2” in a little more detail. But this is where I’m stuck: With numerous sets of customer data I’ve not been able to decipher it. There is a lot written about it in Principles Of Operation but it’s not something I’ve been able to make sense of. I’m sure among my readership there are people who can decipher it. If so please comment below. I’m not the only one who wants to know.

What’s nice about this instrumentation is it describes the addresses of coupling facility images on machines that might not have a z/OS LPAR to cut SMF 70-1 for. So it would be really nice to be able to decipher it. Having said that, SMF 74-4 has for many years told you Plant Number and Serial Number for each coupling facility.

Making Of

I started this blog post a few weeks ago, convinced I’d be able to decipher all the new topology information. Well, I got the most important piece done – as it was pretty trivial. However, it’s annoying not to have got the second piece done.

I’m actually finishing off this blog post sat on a plane at Gate A9 of Heathrow Terminal 5. We’re due to fly to Copenhagen – in my case for a customer workshop tomorrow. And we have a 90 minute delay due to something wrong with the wing of this British Airways Airbus A319. I think the captain made the right call. 😃


  1. You’d be amazed how many web searches still resolve to older z/OS releases. However, it’s easy to get to the 3.1 equivalent from there. 

  2. It certainly behaves that way – in that the home addresses for LPARs consistently fit inside the set of drawers / DCM’s / chips that it suggests. Further, IFLs and ICFs do tend to be in the top drawer. So it’s not garbage. 

Setting Axis Font Sizes In Excel With AppleScript

Just a short post to add another AppleScript snippet some might find useful. (There will be performance-related posts along shortly; I have some in the works.)

As you know from previous posts under the tag topics, I’m automating as much of my Excel use as possible.

So here’s a short snippet of AppleScript to set the x (category) axis font size. It then sets the primary (left) y axis font size to the same. Finally, if there is a secondary (right) y axis it sets its font size.

Generally I find 14 point to be the right size. Obviously you could generalise this with a variable or use another hardcoded size.

The way I use this is to run the AppleScript fragment having selected the chart I want to apply it to.

I think the code is pretty self explanatory:

tell application "Microsoft Excel"
    set c to active chart
    tell c
        set a to (get axis axis type category axis)
        set font size of (font object of tick labels of a) to 14

        set b to (get axis axis type value axis which axis primary axis)
        set font size of (font object of tick labels of b) to 14

        if get has axis axis type value axis axis group secondary axis then
            set c2 to (get axis axis type value axis which axis secondary axis)
            set font size of (font object of tick labels of c2) to 14
        end if

    end tell
end tell

As I’ve said several times, there aren’t many examples of AppleScript driving Excel. My hope is some of my posts get indexed on the wider web and so add to what searches throw up.

Excel – Two Types Of Transpose

In Excel – Sorted With AppleScript I mentioned I might have a go at transposing rows and columns in an Excel spreadsheet with AppleScript.

I said there were two things I could try – each with a different use case:

  • Transposing Rows And Columns In A Spreadsheet
  • Transposing Rows And Columns In A Graph

I – after quite a bit of effort – achieved both. Again, in the spirit of saving others time, I’m posting them here.

Transposing Rows And Columns In A Spreadsheet

The following code transposes the selected cells into cell A1 of a new work sheet. I’m using a new worksheet as it saves worrying about whether the transposed range will overwrite cells you don’t want it to.1 You can always cut and paste from the new worksheet to where you actually want it to go.2

tell application "Microsoft Excel"
    activate
    set r to selection
    copy range r
    tell active workbook
        set tempSheet to (make new worksheet at end)
    end tell
    paste special range "A1" with transpose
end tell

Having ensured Excel is activated the code:

  1. Creates a range r from the selected cells.
  2. Copies the cells to the clipboard.
  3. Creates a new worksheet at the end of the active workbook.
  4. Pastes the copied cells into cell A1 of this new workbook.

One thing I found didn’t work was attempting to paste special with transpose into the original selection; It simply didn’t transpose cells and columns.

Transposing Rows And Columns In A Graph

About half the time Excel decides that each row in the data is a series when I want each column to be a series (and vice versa).

There is a dialog ( “Select Data…”) you can pop up – but it requires taking your hands off the keyboard and clicking a button “Switch Row/Column”. This takes time and gets old really fast when you’re working with graphs all day long.

So I discovered a faster way – which could be automated with a hot key (perhaps with Keyboard Maestro or BetterTouchTool. Here’s the code:

tell application "Microsoft Excel"
    activate
    -- Get the last chart created
    set s to active sheet
    set chartCount to count of charts of chart objects of s
    set c to chart of (last chart object of s)

    -- Get the "plot by" direction for that chart and
    set p to (plot by of c)

    -- Set "plot by" to the opposite orientation
    if p is by columns then
        set plot by of c to by rows
    else
        set plot by of c to by columns
    end if

end tell

Here’s how it works.

  1. Under the Get the last chart created comment are three lines of code which yield the last chart created – in variable c.
  2. Then variable p is set to the value of the plot by attribute of the chart.
  3. Based on whether its value is “by columns” or “by rows” the same attribute is set to the converse value.

The detection of the last chart created is generally what I want: My use case is to create a chart, decide I don’t like it, and then immediately transpose it.3

Conclusion

Once again problems can usefully solved – but with much experimentation. And through extensive use of the excellent Script Debugger. In particular, if you assign an object to a variable you can see its properties.

I hope this has proven useful for somebody; It certainly has for me.

And, having mentioned BetterTouchTool and Keyboard Maestro above, I’ve automated both of these using a Stream Deck. In detail I use a keystroke action that kicks off automation.


  1. Consider the case of a 3 column, 2 row range of cells. Transposing that gives a 2 column, 3 row range of cells. If there were something just below the original then the two cells of that would get overwritten. Conversely, there’d be clean up of 2 cells at the right of the original range needed. 

  2. It’s handy that the just pasted in range remains selected after the code runs. 

  3. I’ve integrated this into one of my other scripts that sets the type of chart. As it runs I can immediately see if the orientation is right – and correct it before the script ends. 

Excel – Sorted With AppleScript

They say necessity is the mother of invention. But so, I’d say, are impatience and frustration.

I find the sorting dialog in Excel on Mac a source of both of these. So I wrote some AppleScript to remove some of the friction.

I’m sharing this because, yet again, I found nothing on the Web to help me. And some of the pieces might be what other people need for other automations.

The Trouble With Excel Sort

It’s easy to invoke the sort dialog from the keyboard – which I’d’ve fixed if I had to. (It’s Cmd+Shift+R.) But after that it gets a bit tedious:

  1. I have to explicitly click to get a drop down of columns to sort on.
  2. The default is to sort ascending and I almost always want to sort descending. So I have to click in a drop down to change that.
  3. It’s very hit and miss whether the “I have headers” is selected – and I always have headers.
  4. There is no tabbing or character navigation; It’s a badly designed dialog.

To be fair the dialog allows sorting in multiple columns – but it’s rare I’d want to do that.

You’ll note the things I’m objecting to require taking my hands off the keyboard, which is another joy / productivity killer.

Tiny violins, perhaps. 😀

The Code

All the above problems are tolerable. But it turns out they’re fixable – with a little AppleScript. So why not?

As I said above, searching the web proved futile. So the following was hard fought – which is why I think it worth sharing with those of you who’ve made it this far.

When you run this code it works on the range (rows and columns) you’ve selected, treating the first row as a headings row.

From the user point of view

  1. You are prompted for which column to sort on.
  2. You are prompted for whether the sort is ascending or descending.
  3. The sort is performed.

First character navigation works.

Here is the complete code.

tell application "Microsoft Excel"
    activate

    set r to selection

    -- Populate list of header cell values
    set headerList to {}

    set headerRow to row 0 of r
    set columnCount to count of columns of r
    set c to cells of headerRow
    set ix to 1
    repeat with theHeaderCell in c
        set headerText to (ix as string) & " " & string value of theHeaderCell
        set ix to ix + 1
        set end of headerList to headerText
    end repeat


    -- Choose which column to sort on
    set selectedHeader to {choose from list headerList}
    set chosenHeaderValue to item 1 of item 1 of selectedHeader

    -- Calculate the column number of the sorting column
    repeat with i from 1 to columnCount
        set columnHeader to item i of headerList
        if columnHeader = chosenHeaderValue then exit repeat
    end repeat
    set sortingColumn to i + (first column index of r) - 1

    set direction to item 1 of item 1 of {choose from list {"Ascending", "Descending"}}

    set sortingCell to cell (first row index of r) of column sortingColumn

    if direction is "Ascending" then
        sort r key1 sortingCell order1 sort ascending header header yes
    else

        sort r key1 sortingCell order1 sort descending header header yes
    end if

end tell

Now let’s look at some of the key elements of the code.

Getting The Currently Selected Cells

You get them as a range object with

set r to selection

A lot of things you wouldn’t think of as ranges in fact are. This is a more obvious case, though.

Populating A List With Header Cell Values

set headerList to {}

set headerRow to row 0 of r
set columnCount to count of columns of r
set c to cells of headerRow
set ix to 1
repeat with theHeaderCell in c
    set headerText to (ix as string) & " " & string value of theHeaderCell
    set ix to ix + 1
    set end of headerList to headerText
end repeat

In the above the code

  1. Extracts the first row of the range into headerRow.
  2. Creates a count of the number of cells in that first row – for later use – columnCount.
  3. Extracts the cells of that first row as a list – c.
  4. Iterates over the list of cells. For each cell it extracts the value and prepends an index to it – with a space in between. The number is needed to create a uniquely identifying string. This string is added to the list headerList. ix is a counter, of course.

The net result is a list of strings with values such as “1 System/Subsystem”.

Choosing Which Column To Sort On

AppleScript allows you to select (potentially multiple) values from a displayed list.

The code to get the text value of such selected values is

set selectedHeader to {choose from list headerList}
set chosenHeaderValue to item 1 of item 1 of selectedHeader

The first line pops up a dialog with the list in (in this case headerList, previously created).

The second line selects the first selected item from the list and sets chosenHeaderValue to it.

You could, of course, pop up any list and handle it this way, including if you allowed multiple values to be selected on.

Calculating The Column Number Of The Sorting Column

Unfortunately choose from list doesn’t return the index(es) of the chosen item(s). So you have to search through the list to find a match. Because there might be duplicate items in the list I had to prepend a unique identifier. Hence the use of the counter ix above.

Here’s the code to search for the selected item

repeat with i from 1 to columnCount
    set columnHeader to item i of headerList
    if columnHeader = chosenHeaderValue then exit repeat
end repeat

It compares the selected item’s text to each item in the list – until it finds a match.

The value of i has to be added to the first column number of the range (and then one taken off) to get the number of the column you want to sort on:

set sortingColumn to i + (first column index of r) - 1

Choosing The Direction To Sort In

Another dialog is used to choose the sorting direction

    set direction to item 1 of item 1 of {choose from list {"Ascending", "Descending"}}

Performing The Sort

First we need to nominate a cell to sort based on. In fact it suffices to choose the first cell of the sorting column:

set sortingCell to cell (first row index of r) of column sortingColumn

The sort itself is quite straightforward. Depending on the value of direction you can sort ascending or descending:

if direction is "Ascending" then
    sort r key1 sortingCell order1 sort ascending header header yes
else
    sort r key1 sortingCell order1 sort descending header header yes
end if

If you had more than one column to sort on you might use key2, key3 etc in the above, nominating a different cell for each.

Conclusion

Again, the struggle is real 😀 when it comes to using AppleScript to automate Excel externally. To some extent VBA can be used internally but it can’t readily be injected and controlled from outside.

And I have a couple more challenges in mind:

  • Transposing rows and columns in a range. Or, equivalently valuably to me, telling graphing which direction defines series. Perhaps both.
  • Folding a range – so that it is wider. So, for example, 2 columns wide and 8 rows tall becomes 4 columns wide and 4 columns tall. This kind of folding is useful to me – when combined with exporting to CSV. (CSV can be input to mdpre / md2pptx to make tables in PowerPoint slides.)

I know transposition can be done – but maybe not without explicitly iterating over cells. We’ll see.