Mainframe, Performance, Topics

Coupling Facility Topology Information – A Continuing Journey

(Originally posted 2013-07-07.)

9 months on from this post and I finally get some time to enhance my Coupling Facility reporting.

If you recall I mentioned OA37826, which provides new SMF 74 Subtype 4 data and fields in the RMF Coupling Facility Activity report – in support of enhancements in CFLEVEL 18.[1]

I also mentioned that my 74–4 code needed reworking to properly support the Remote Coupling Facility Section [2] and that this is a prerequisite to working with the new Channel Path Data Section. And part of the reason why this has taken 9 months to get to is that it’s a big set of changes, with much scope for bustage. The other is not having seen enough data to test with.

I have some test data from a very large customer Parallel Sysplex, containing a mixture of zEC12 (at CFLEVEL 18) and z196 (CFLEVEL 17) and z10 (CFLEVEL 16) machines. There are five coupling facilities in all. This is a perfect set of test data for my purposes.

Handling CF Duplexing Properly

The 5 CFs potentially duplex with each other.[3]

Prior to this week’s programming my code picked the first Remote CF Section where the Remote CF wasn’t the same CF as the Local [4]. It just mapped the whole section. With 5 CFs that clearly isn’t good enough.

After a major restructuring of the code I now have all the CFs a specific CF talks to.

That in itself should make for some interesting discussions – particularly when I throw in the traffic and Service Times the Remote CF Sections give me.

Channel Path Identifiers (CHPIDs)

There was a pleasant surprise here: Looking at the data it appears that with OA37826 you get CHPID numbers in both the Local Coupling Facility Data Section (1 per SMF 74–4 record) and the Remote Coupling Facility Data Sections.

It doesn’t matter whether the CF is at CFLEVEL 18 or not: I still see’em.

Previously I’ve relied on the SMF Type 73 (RMF Channel Path Activity) record to tell me the Coupling Links an LPAR has. But SMF 73 can’t tell me which CF they are to (if any). It also can’t tell me which LPARs they’re shared with.

So my code has had to guess. 🙂 And it guessed conservatively. 😦

Now I can see the CHPIDs and the SMF 73 tells me which LCSS[5] the LPAR is in. So I can tell which CHPIDs go to which CFs and which are shared with whom. I can see this even for Internal Coupling (IC) Links. And I can see it for CF-to-CF links.

This was a pleasant surprise as, scouring all the documentation I have, I didn’t expect OA37826 to have value prior to CFLEVEL 18.

It means you can draw out the topology of the Parallel Sysplex much better. If you want to (and I certainly do). More to the point it gives you the ability from data to check you got what you thought you’d designed.

One point on this: I’ve noticed extraneous CHPIDs in the arrays, so it’s important to know how many CHPIDs there are (from other fields in the sections) and only accept the CHPIDs for those. I don’t know whether it’s worth getting that fixed.

Next Steps

So I haven’t even got to the CFLEVEL 18 – specific stuff in my programming. But I can now start on it. And I expect it to yield some valuable information about signal timings and interesting estimates of distance – from the new Channel Path Data Section.[6] That and some information about adapters – which I hope is more than just “tourist information”.

And when I’m done I’ll tell you what I see in this data too.

As well as OA37826 (for RMF) you need OA38312 (for the z/OS XES component). ↩
The Remote CF Section provides information at the CF level about Structure Duplexing – at a CF to CF level. ↩
I haven’t got round to writing the code that figures out which CFs actively duplex to which, and with previous studies having far fewer than 5 CFs it’s not been a priority to figure it out. With 5 I probably should. ↩
Yes, there really is always a section from the CF to itself. 🙂 ↩
Logical Channel Subsystem, introduced with z990. ↩
Actually I see these sections with OA37826 even if the CFLEVEL is less than 18. It’s just that they don’t contain any useful data. ↩

Why Do I Program In ?

(Originally posted 2013-06-30.)

I’ll admit it: I have a dirty little secret. 🙂 All I can say is it was in my youth – and I didn’t know any better. 🙂

And folks I’m exclusively revealing my secret here in this blog… 🙂

I learned to program in BASIC. 🙂

Actually the “I didn’t know any better” bit is a lie: My Dad can attest to the fact he had Elsevier Computer Monographs in the house, which I read avidly. (Actually I added to the collection when at university and these books are among the few paper Computing books I still have.) And his view, which I think was the prevailing industry view, was that BASIC rots the brain and Algol is much better.

I don’t disagree about Algol being a superior language. I do disagree that BASIC teaches bad habits you can’t overcome. But you use the language you have available to express yourself. And, by learning enough languages or gaining experience, you develop programming maturity. (You’ll’ve guessed, I hope, that Algol wasn’t available to me.)

To be fair to the “BASIC is horrid” camp it is horrid 🙂 but I don’t think longitudinal studies had been conducted that proved it to be a stupefiant (great French word) 🙂 And, to my mind, getting people programming is more important: We can work on points of style later.

The next steps in my programming career (verb or noun?) 🙂 were Z80 Assembler, a dialect of OPAL, and CORAL. Again used in the “program in what’s available” mode.

I don’t think descending into Z80 Assembler (and actually, more interesting to me, machine code) did me any harm. And I found structured programming really quite easy to do – in OPAL and CORAL.

University saw FORTRAN, C, a functional programming language called SASL, Pascal, amongst others.

So, even before joining IBM, I’d seen a fair few programming languages.

This isn’t going to be a taxonomy of programming languages, you’ll be pleased to know.

Here’s an interesting one: A debugger for the Amstrad CPC–464 (I learnt Z80 Assembler on) allowed you to write macros in Forth. I loved it. “What a weird (reverse-Polish) language” you might say. But, and here’s the reason for mentioning it, it was the language that was available to do the job. And that’s a theme I’ll return to.

Now I’m coming to the point: This post, I think (sometimes I’m not so sure), came from conversations about programming languages on IBM-MAIN and further reflection on this blog post of mine.

I’d like to talk about why we chose the programming languages we do (or at least why I do).

People talk about programming language design. And in Dragging REXX Into The 21st Century? I’m hinting at that – through my efforts to gain some elegance.
Lots of people talk about efficiency – for example whether a good Assembler programmer can outstrip a COBOL compiler. (Yes over a short enough distance.) 🙂
People have to choose from the palette of languages available. For example LISP for EMACS. Or whatever your installation has made available.

I think I’m in the third category with a nod to the first, and sometimes I’m forced to worry about efficiency. (Some of our tools wouldn’t be feasible without the tuning we’ve given them: CPU Time isn’t the factor, completing sometime before two years after hell freezes over is.) 🙂 So, for example:

My mainframe programming is a mixture of Assembler and REXX – because the SMF-crunching code we use requires it. (And actually, in recent years, REXX driving GDDM has been a factor.)
My stand-alone programming on the PC tends to be Python, though AppleScript on the Mac is a must – because it can drive the Evernote API, for example.
My “web serving”, which is actually with the server and the client on the same (Linux) machine is Apache with PHP. This pumps out javascript (mostly Dojo for the UI elements). This is a fairly natural choice – and is much better than trying to build a GUI in some native language. The PHP code also automates a lot of FTP gets from the mainframe – textual reports (actually Bookmaster script) and graphics.

It’s horses for courses, and I won’t swear I’ve necessarily made the right choices. But each of these have magical superpowers because of the environment they run in and libraries/built-ins they come with.

A while back someone “threatened” me with having to learn SAS. My response was along the lines of “bring it on, as it’ll be about the 20th language I’ll’ve become familiar with.” I guess that makes me a geek. 🙂

Or maybe a nerd. 🙂

I guess it also explains why this blog contains as much programming as it does. I’m thinking of renaming it to [“Mainframe”,“Performance”,“Topics”]. 🙂

The discussion in IBM-MAIN and ruminating about Dragging REXX Into The 21st Century? got me thinking about how fit languages are for the future, and REXX in particular.

I’d say “don’t expect the REXX language to evolve”, whether IBM ports [http://www.oorexx.org](Open Object Rexx) to z/OS or not. (I seriously doubt it will, by the way.) But that doesn’t mean it’s a bad language to write new stuff in. It’s worth noting that if you have, for example, ISPF applications to write it’s still a good language to do it in.

One of the strengths that I think also helps is REXX’s ability to embed host commands. So, for example, if somebody wrote an XML DOM Tree parser set of commands REXX could script it. (If this sounds like what javascript does with, for example, getElementById(), it should.) So the complaint on IBM-MAIN that REXX doesn’t do XML could be solved – with a few 🙂 lines of code.

Another is BPXWUNIX – which I’ve mentioned many times in this blog. It allows you to use all the Unix functionality z/OS offers.

One thing I’d admit we lack on z/OS is support for some thumpingly good modern programming languages – Python, node.js and (recent) PHP being but three. And I mention this here because of why:

I’m not so worried about the expressiveness etc of the languages we have. We can program just fine with what we have.
I’d like to see more commercial packages ported. These have their own prerequisites when it comes to environments and languages. For example MediaWiki is written in PHP.

So there you are, some stray thoughts on programming languages – on a gloriously sunny Sunday afternoon. And if the tone got tetchier (not techier) 🙂 as you go through it’s because I sat down to write in stages after pulling tall stinging nettles out in the garden. (In provably pessimal 🙂 shorts and t-shirt.) And it’s a well-known fact (to those who know it well)[1] 🙂 that everyone in the UK thinks their garden’s stinging nettles are the most potent in the world. 🙂

Thank you Robert Rankin :-). ↩

Recent Conference Presentations

(Originally posted 2013-06-24.)

I’ve been fortunate enough to speak at two conferences in the past month:

UKCMG Annual Conference – in London in May
System z Technical University – just outside Munich in June

I presented three different sets of material, with very little overlap:

The first two were given at both conferences and the third one only in Munich.

If you click on the links you’ll see I’ve uploaded them all to Slideshare. I considered it only fair to wait until after the Munich conference to upload them.

I thoroughly enjoyed both conferences. But then I usually do: It’s my prime method of learning (other than by doing, and by data analysis) and I’m always pleased to run into old friends and new. (As a firm believer in “strangers are just friends we haven’t met yet” there were a fair number of those, too.)

It was particularly good to see so many people from Hursley – both CICS and Websphere MQ. As well as the usual other Development labs.

If there were to be only one new feature I spotted it would be the support for Variable Blocked Spanned (VBS) records with REXX’s EXECIO in z/OS 2.1 TSO. Think “SMF processing with REXX”. In my residency this autumn I expect to have a chance to play with this.

I’m writing this using Byword on an iPad Mini, using a Logitech external keyboard – on a flight to South Africa. It’s proving to be a good combination, especially as Byword has MultiMarkdown support (a set of extensions to Markdown) and Evernote supported. It’ll probably be my new writing rig for a while.

Of course it helps if you put the iPad Mini in the right way round. 🙂

And a week has passed since I wrote this. 😦 It’s been a busy week that’s really been an interesting test of the idea of “treating an address space as a black box”. (And so much more besides.) Perhaps I’ll write about it some time.

Dragging REXX Into The 21st Century?

(Originally posted 2013-06-07.)

I like REXX but sometimes it leaves a little to be desired. This post is about a technique for dealing with some of the issues. I present it in the hope some of you will find it worth building on, or using directly.

Note: I’m talking about Classic REXX and not Open Object REXX.

List Comprehensions are widespread in modern programming languages – because they express concisely otherwise verbose concepts – such as looping.

Here’s an example from javascript:

var numbers = [1, 4, 9];
var roots = numbers.map(Math.sqrt);
/* roots is now [1, 2, 3], numbers is still [1, 4, 9] */

It’s taken from here which is a good description of javascript’s support for arrays.

Essentially it applies the square root function (Math.sqrt) to each element of the array numbers, using the map method. Even though it processes every element there’s no loop in sight. This, to me, is quite elegant and very maintainable. It gets rid of a lot of looping cruft that adds no value.

My Challenge

I have a lot of REXX code – essential to fetch data from the performance databases I build and turn it into graphs and tabular reports. Much of this code iterates over stem variables (similar to arrays – for the non-REXX reader) or character strings that are tokens separated by spaces (blanks).

An example of a blank-delimited token string is:

address_spaces="CICSIP01 CICSIP02 CICSPA CICXYZ DB1ADBM1 MQ1AMSTR MQ1ACHIN"

It would be really nice when processing such a string – perhaps to pick up all the tokens beginning “CICS” – to be able to do it simply. Perhaps an incantation like:

cics_regions=filter("find","CICS",address_spaces)

In this example the filter routine applies the find routine to each token in the string, with a parameter "CICS" (the search argument).

And not a loop in sight.

My Experiment

I implemented versions of map, filter and reduce. I’ll talk about how but first here’s what they do:

Function	Purpose
map	Applies a routine to each element
filter	Creates a subset of the string with each element being kept or discarded based on the routine’s return value (1 to keep and 0 to throw the item away)
reduce	Produce a result based on an initial value and applying a routine to each element.

Here’s a simple version of filter:

filter: procedure 
parse arg f,p1,p2,p3,list 
if list="" then do 
  parse arg f,p1,p2,list 
  if list="" then do 
    parse arg f,p1,list 
    if list="" then do 
      funstem="keepit="f"(" 
    end 
    else do 
      funstem="keepit="f"("p1"," 
    end 
  end 
  else do 
    funstem="keepit="f"("p1","p2"," 
  end 
end 
else do 
  funstem="keepit="f"("p1","p2","p3","
end 
outlist="" 
do forever 
  parse value list with item list 
  interpret funstem""item")" 
  if keepit=1 then do 
    if outlist="" then do 
      outlist=item 
    end 
    else do 
      outlist=outlist item
    end 
  end 
  if list="" then leave 
end 
return outlist

Variable “list” is the input space-separated list. “outlist” is the output list that filter builds – in the same space-separated list format.

Much of this is in fact parameter handling: The p1, p2, p3 optional parameters need checking for. But the “heavy lifting” comes in three parts:

Breaking the string into tokens (or items, if you prefer).
Using interpret to invoke the filter function (named in variable f) against each token.
Checking the value of the keepit variable on return from the filter function:

If it’s 1 then keep the item. If not then remove it from the list.

I also wrote a filter called “grepFilter” (amongst others). Recall the example above where I wanted to find the string “CICS” at the beginning of a token. That could’ve been done with a filter that checked for pos("CICS",item)=1. That’s obviously a very simple case. grepFilter, as the name suggests, uses grep against each token. It worked nicely (though I suggest it fails my long-standing “minimise the transitions between REXX and Unix through BPXWUNIX” test).

And then I got playing with examples, including “pipelining” – from, say, map to filter to reduce – such as:

say reduce("sum",0,filter("gt",8,map("timesit",2,"1 2 3 4 5 6")))

Issues

There are a number of issues with this approach:

You’ll notice the function name (first parameter in the filter example above) is in fact a character string.

It’s not a function reference as other languages would see it. REXX doesn’t have a first class function data type. Suppose you didn’t have a procedure of that name in your code: You’d get some weird error messages at run time. And while you can pass around character strings all you want the semantics are different from passing around function references.
The vital piece of REXX that makes this technique possible is the interpret instruction.

It’s very powerful but comes at a bit of a cost: When the REXX interpreter starts it tokenises the REXX exec – for performance reasons. It can’t tokenise the string passed to interpret. So performance could suffer. For my use cases most of the time (and CPU time) is spent in commands (scripted by REXX) rather than in the REXX code itself. (I also think the process of mapping a function to a list suffers less than the average REXX instruction if run through interpret.
The requirement to write, for example

say reduce("sum",0,filter("gt",8,map("timesit",2,"1 2 3 4 5 6")))

rather than

say "1 2 3 4 5 6".map("timesit",2).filter("gt",8).reduce("sum",0)

is inelegant. Fixing this would require subverting a major portion of what REXX is. And that’s not what I’m trying to do.
The need to apply a function to each item – particularly in the filter case – can be overkill.

In my Production code I can write

filter("item>8","1 2 4 8 16 32")

as I check the first parameter for characters such as “>” and “=”. So no filtering function required.
REXX doesn’t have anonymous functions and I can’t think of a way to simulate them. Can you? If you look at the linked Wikipedia entry it shows how expressive they can be.

These are worth thinking about but not – I would submit – show stoppers. They just require care in using these techniques and sensible expectations.

Conclusions

It’s perfectly possible to do some modern things in REXX – if you work at it. And this post has been the result of experimentation. Experimentation which I’m going to use directly in some of my programs. (In fact I’ve taken the prototype code and extended it for Production. I’ve kept it “simple” here.)

I’d note that “CMS Pipelines” would do some of this – but not all. And in any case most people don’t have CMS Pipelines – whether on VM or ported to TSO. (TSO is my case, but mostly in batch.)

I don’t believe “Classic” REXX to be under active development so asking for new features is probably a waste of time. Hence my tack of simulating them, and living with the limitations of the simulation: It still makes for clearer, more maintainable code.

Care to try to simulate other modern language features? Lambda or Currying would be pretty similar.

Of course if I had kept my blinkers on then I wouldn’t know about all these programming concepts and wouldn’t be trying to apply them to REXX. But where’s the fun in that?

Data Collection Requirements

(Originally posted 2013-06-01.)

Over the years I’ve written emails with data collection requirements dozens of times, with varying degrees of clarity. It would be better, wouldn’t it, to write it once. I don’t think I can get out of the business of writing such emails entirely but here’s a goodly chunk of it.

Another thing that struck me is that the value of some types of data has increased enormously over time. So some data that might’ve been in the “I don’t mind if you don’t send it” category has been elevated to “things will be much better if you do”.

I’d like to articulate that more clearly.

Bare Minimum

I always want SMF Types 70 through to 79, whatever you have.

I put it this loosely because I have almost 100% success with getting the data I really need, because the customer’s already collecting it. That has the distinct advantage of allowing me to ask for historical data, whether for longitudinal studies or just to capture a “problem” period.

There’s rarely been a study where I didn’t want RMF data. (Occasionally I’ve only wanted DB2 Accounting Trace.)

I like to have this data for a few days, but sometimes that isn’t possible. What really freaks my code out is having just a few intervals to work with.

Another question is “for which systems?” That’s a more difficult question to answer. Certainly I want all the major systems in the relevant sysplex. Ideally *all the systems in the sysplex and even all the systems on the machines involved. But that’s usually not realistic. You’ve probably guessed already that the nearer to that ideal the better the insights (probably).

Strongly Enhancing

As you’ll’ve seen from this blog Type 30 Subtypes 2 and 3 Interval records are of increasing value to me: I recently ran some 15 month old data through the latest level of my code and was gratified at how much more insight I gained into how the customer’s systems worked.

A flavour of this is described in posts like Another Usage Of Usage Information.

So this data has definitely moved from the category of “I can just break down CPU usage a little further” to “it will make a large difference if you send it”.

Here the systems and time ranges I’d prefer to see data from can be much less: I probably don’t need to see this data from the Sysprogs’ sandpit.

Nice To Have

With most studies I can get by without the WLM Service Definition but it helps in certain circumstances (as I mentioned in Playing Spot The Difference With WLM Service Definitions.)

I’m OK with either the WLM ISPF TLIB or the XML version (as mentioned in that post).

If I want to take disk performance down to below the volume level SMF 42–6 Data Set Performance records are a must. It’s also the case you can learn an awful lot about DB2 table space and index space fragments from 42–6, there being a well-known naming convention for DB2 data sets.

Specialist Subjects

The above is common to most studies. The following deals with more common specialist needs.

DB2

Most of the time I’m seeking to explain one of two things about DB2: * Where the CPU is going. * Where the time is going. In both cases I need DB2 Accounting Trace (SMF 101). The quality of this is variable. For example, to get CPU down to the Package (Program) level I need Trace Classes 7 and 8, in addition to the usual 1, 2 and 3. (Sometimes even 2 and 3 aren’t on.)

It’s quite likely this isn’t data that in its full glory is being collected all the time. So it’s a 30% chance of getting this retrospectively.

Sometimes I’m keen to understand the DB2 subsystem, which is where Statistics Trace comes in. The default statistics interval (STATIME) used to be a horrendous 30 minutes. Now it’s much lower so I’m pleased that issue has gone away. I ask for Trace Classes 1,3,4,5,6,8,10 which result in SMF 100 and 102 records. (I don’t ask for Performance Trace which also results in 102 records, albeit different ones.)

Again the questions of “for which subsystems?” and “when for?” come into play. That’s where negotiation is important:

It’s a lot of data to send.
Some installations deem it too expensive to collect on a continual basis.

I don’t disagree with either of those.

CICS

Here Statistics Trace (SMF 110) is really useful – especially if you have a sensible statistics interval.

For application response time and CPU breakdown to the transaction level Monitor Trace is the thing. Again this is the sort of thing customers don’t keep on a regular basis – or for that many regions. It’s also quite prone to breakage: Some customers remove fields from the record with a customised Monitor Control Table (MCT).

I try to glean what I can from SMF 30 about CICS – as numerous blog posts have pointed out – because I can get it for many more CICS regions than the CICS-specific data would furnish.

Batch

Batch is the area that takes in a widest range of data sources, the most fundamental of which are SMF 30 Subtypes 4 (Step-End) and 5 (Job-End).

I’ve already mentioned DB2 Accounting Trace and it’s most of what you need for understanding the timings of DB2 jobs.

For VSAM data sets SMF 62 is OPEN and 64 is CLOSE. For non-VSAM SMF 14 is Reads and 15 is Writes.

For DFSORT SMF 16 is really handy and even better if SMF=FULL is in effect. (Often it isn’t but I generally wouldn’t stop data collection to fix that.)

MQ

I only occasionally look at Websphere MQ, though I’d like to do much more with it. I don’t think many people are familiar with the data. Statistics Trace (analogous to DB2’s but different) is SMF 115. Accounting Trace – which deals with applications – is SMF 116.

If I want to see what connects to MQ the Usage Information in SMF 30 is generally enough – though it doesn’t tell me much about work coming in through the CHIN address space. For that I really do need Accounting Trace. (An analogous information can be made about remote access to DB2.)

Getting Data To Me

This is usually OK but it’s worth reminding people of a few simple rules:

Always use IFASMFDP (and / or IFASMFDL) to move SMF records around. (Using IEBGENER will probably break them.)
TERSE the data using AMATERSE (or perhaps TRSMAIN). This works fine for both SMF data and ISPF TLIBs (and I suppose partitioned data sets in general).
FTP the data BINARY to ECUREP.
Make sure you send the data to the right directory in ECUREP’s file store. The standard encoding of the PMR number helps a number of IBM systems (such as RETAIN) to work swiftly and effectively. I’ll give you a PMR number on my queue (UZPMOS) or we can use another one.

That seems like a lot of rules but most of it should be familiar to anyone who’s ever sent in documentation in support of a PMR. (Only Rule 1 is new.) If you have access to the PMR text – and quite a few customers do – it should also enable you to track the data inbound.

In Conclusion

Realistically people might not have all the data I want and so there’s a process of negotiation, mainly between timeliness and retrospectiveness versus quality. Clarifying that trade off would be helpful, which is why I like to run a Data Collection Kick-Off call. Ideally that call would be face to face, but I’m less insistent on that if distance makes it difficult.

At the end of the day I’m quite flexible and do what I can with whatever data I get. Of course you can’t magic missing data out of thin air, and can only occasionally repair it.

What I hope is that data collection is not overly burdensome and doesn’t cause stress to the customer. I also like to think that when they’ve sent the data in they can relax and more or less forget about it until “showtime”. 🙂

I also hope customers understand why they’ve been asked for the data they have. And that’s part of the point of this post, the rest being articulating what I need.

REXX That’s Sensitive To Where It’s Called From

(Originally posted 2013-05-26.)

I have REXX code that can be called directly by TSO (in DD SYSTSIN data) or else by another REXX function. I want it to behave differently in each case:

If called directly from TSO I want it to print something.
If called from another function I want it to return some values to the calling function.

So I thought about how to do this. The answer’s quite simple: Use the parse source command and examine the second word returned.

Here’s a simple example, the function myfunc.

/* REXX myfunc */
interpret "x=1;y=2" 
parse source . envt . 
if envt="COMMAND" then do 
  /* Called from top level command */
    say x y 
end 
else do 
  /* Called from procedure or function */
  return x y 
end

The interpret command is a fancy way of assigning two variables (and really a leftover from another test). It works but you would normally code

x=1
y=2

The parse source command returns a number of words but it’s the second one that is of interest – and is saved in the variable envt.

The following lines test whether envt has the value “COMMAND” or not. If so the function’s been called directly from TSO. If not it hasn’t. In the one case the variable values are printed. In the other they’re returned to the calling routine.

The calling routine might call have a line similar to

parse value myfunc() with x y

Which would unpack the two variables.

(The original interpret "x=1;y=2" might look silly but interpret "x='1 first';y='2 second'" doesn’t – as a way of passing strings with arbitrary spaces in them back from a routine.)

This is a simplified version of something I want to do in my own code. There might be a few people who will find this useful so why keep it to myself? 🙂 (It’s probably a standard technique nobody taught me.) 🙂

Discovering Report Class / Service Class Correspondences

(Originally posted 2013-05-22.)

It’s possible I’ve written something about this before: My blog is so extensive now it’s hard to find out exactly what I’ve written about (and I’m going to have to do something about that).

I say “written something” because I know for sure I haven’t written about the SMF record field I want to introduce you to now.

Previously

If when you send me data you include Type 30 interval records I’ll use them to relate WLM Service Classes to Report Classes: Workload, Service Class and Report Class are all in there.

But these records are only for address spaces. Address spaces that actually got created. And therein lies a problem: Only some of the Service Class / Report Class relationships can be gleaned this way.

In practice I’ve found this (incomplete but not inaccurate) information handy. So I’d like to fill in some gaps.

New News

I expect you didn’t know this either – so I call it “new news”: There’s a handy field in SMF 72 Subtype 3 (Workload Activity Report) called R723PLSC. It has nothing to do with PSLC.

This is defined as the “Service Class that last contributed to this Report Class period during this interval.” I’ve highlighted the word “last” as its quite important but we’ll come back to that in a minute.

This allows you to see some relationships for work that isn’t represented by address spaces, for instance DDF. (In my test data it’s DDF I’m seeing.)

I’ve spent some time adding this in to my code. Usually I’d summarise over several hours. In this case if I do I miss stuff.

The emphasised “last” above means that only one of the (potentially several) Service Classes that correspond to this Report Class shows up in the record. So I use a set of rows, each representing a short interval, to get the correspondence. In my test data this approach yields more correspondences – as somehow the last one often isn’t always the same one from interval to interval.

If you use Report Classes to break out a subset of a Service Class the “last Service Class” issue doesn’t arise. If you use Report Classes for aggregation (or in a hybrid way) it certainly does.

(I’m not all that keen on using Report Classes for aggregation anyway: Decent reporting tools can do that for you. But I could be persuaded. I’m keener on using them for breakouts, such as DDF applications that share a common Service Class, or to break out an address space or several.)

I’m not claiming to have got all the Service Class / Report Class correspondences but I’ve got more of them – and for an important set of cases: Service Classes and Report Classes that don’t correspond to address spaces.

As you’ll see in Playing Spot The Difference With WLM Service Definitions I prefer to have the WLM Service Definition to work with – and I’ll be asking for it more fervently in the future. But you have to work with the data you can readily obtain. And R723PLSC is a handy field to have learnt about. You might find it useful, too.

Playing Spot The Difference With WLM Service Definitions

(Originally posted 2013-05-20.)

A customer asked me to examine two WLM service definition snapshots taken on adjacent days – and discern any differences. This is not a challenge I’ve been set before – and so I expect it’s pretty rare. But, thinking about it, I reckon it could be quite useful.

So they sent me two service definitions, one day apart, and one from months later. When I say “a service definition” I mean the ISPF table library (TLIB) in which it’s stored. TERSEd and sent via FTP BINARY they make the trip just fine.

On my z/OS system I can fire up the WLM Service Definition Editor and read (and even edit) these just fine. (I could print the policy in a number of different formats – report, CSV file or GML. But I don’t choose to.)

I could even compare the two TLIBs but I don’t think that’d be a useful or consumable comparison. Likewise the three kinds of policy prints I just mentioned.

So, I decided to revisit an approach I’ve posted about before: Processing the XML version of the service definition.

Done right this could be a way to make a meaningful comparison – because it’s cogniscent of the structure of the Service Definitions.

When I last looked at the XML version I think I was on Windows. Now I’m on Linux. Unperturbed I downloaded the WLM Service Definition Editor – which is a .exe file. But it unpacks – with Archive Manager – anyway. Inside is, amongst other things, a jar file and two REXX execs – ISPF2XML and XML2ISPF.

(You can run the jar file under Linux and the SD Editor works. But that’s not what I was after.)

The ISPF2XML exec runs under REXX/ISPF and produces a very nice XML file.

In conversation with the author it transpires this isn’t actively maintained anymore and one should probably use the one from z/OSMF now. (I’d be interested in seeing some XML from the new editor – if anyone is using it and is willing to send me a file.)

Again, I could compare two XML files but I don’t think a raw textual comparison would be very helpful.

A Byproduct – An XML Formatter For HTML

Because the XML file is well-formed it’s quite easy to parse it. And it is quite comprehensible, in its own terms.

So, I wrote some PHP code that uses XPATH to process the file. Why PHP? Because I’m creating HTML (and serving it from Apache on my laptop to itself) and its XPATH support is very good.

So now I have a nice Formatter for the whole Service Definition: If you get to send me your XML I get to send you the HTML. And at the same time I get to understand more about how the classification side of your WLM policy works.

(RMF has nothing to say about WLM classification rules’ firing: I use SMF 30 to try to guess this stuff – but that has limitations. And for DDF work I have to rely on the DB2 Accounting Trace QWACWLME field.)

Comparing Two Service Definitions

I only got part way through writing the comparison code: I compared the first-level nodes (children of the root node, such as the ClassificationRules element) by using the saveHTML method and lexically comparing the strings produced.

Correctly this told me the overnight changes were twofold:

A new report class had been added.
A new classification rule had been added, assigning this report class to some DDF work.

I say “correctly” because eyeballing my HTML reports told me the same thing: Opening the two of them in tabs, scrolling both to the top and then paging each down in turn took about 5 minutes. Doing it that slowly convinced me I’d spotted all the differences. (It helps I have built in some navigation aids along the way – which I won’t bore you with.)

Timestamps and Userids

Remember the two service definitions I’m comparing are from adjacent days. It’s handy they have timestamps for when e.g. A resource group was created. And always there’s a matching “update” timestamps.

Furthermore each timestamp is accompanied by a userid.

Sometimes these are goofy – “1900–01–01” 🙂 or “N.N” – but it’s nice to see “CLW” 🙂 appear, indicating the provenance of the service definition. (Even the goofy ones are, of course, meaningful.)

More seriously a timestamp between one day and the next is helpful.

Conclusion

RMF only gives you so much (but it’s a lot): Sometimes you need to go much further. And the XML version, whatever you do with it, fits the bill nicely.

Comparing two Service Definitions helps identify when changes were made. Picking up the timestamps narrows the doubt even more. And knowing the userids that authored changes helps drive discussions about who changed what and why.

You could readily save copies of the ISPF TLIB (or the XML) on a daily or weekly basis, and compare generations.

What I’d like to know is if a WLM Service Definition comparison tool would be generally useful for customers. Well would it?

New Batch Residency

(Originally posted 2013-05-02.)

In October Frank Kyne and I expect to run a residency in Poughkeepsie. You can find the announcement here.

The residency builds on the ideas presented here and three subsequent posts.

I revisited a specific part of it in Cloning Fan-In.

So what are we going to do?

For a start we’re going to assemble a team of 4 skilled mainframe folks from wherever we can. 🙂 One of them will be me, which leaves 3. You could be one of those – but only if you throw your hat in the ring.

We’re looking for three distinct roles:

Someone with good scheduling (TWS) and JCL skills.
Someone with experience in writing and tuning COBOL / DB2 programs.
The same but for the pairing of PL/I and VSAM.

Actually there’s some flexibility in these last two roles: COBOL / VSAM and PL/I / DB2 would work just fine.

But I still haven’t told you what we’ll actually do…

Residency Goal

We aim to teach people how to successfully clone individual batch jobs – through examples and guidance.

How We’ll Do It

The referenced blog posts describe some theory. This residency will write a Redbook that’ll describe the practice.

We’ll create two test cases that we’ll assert want cloning. They’ll process a large number of records / rows – in a loop. This is a very common application pattern: If you can think of another one we’ll entertain it.

One program will be written in COBOL and the other in PL/I. Hence the programming skill requirements.

One will access DB2 data primarily, the other mainly VSAM. Which explains those two skill requirements.

So the first few days will create these baselines – and measure them.

Then we’ll investigate cloning – 2-up, then 4-up, then 8-up, etc..

Why This Is Non-Trivial

If these cases were read-only this might be trivial. If these cases didn’t write a summary report at the end the same might be true.

But we won’t make it so easy on ourselves:

We’ll update something in each case – a file or a DB2 table.
We’ll read a second file / table (a lookup table, if you will).
We’ll write a report at the end.

All these reflect real life problems people will have.

And if the residents can think of some more pain to inflict on ourselves we will. 🙂

The Report

As I mentioned in Cloning Fan-In many programs produce a report. This is easy before cloning. With cloning it’s much harder. So we need to exercise that.

But I posited a modified architecture: Create data files and merge them in a separate reporting job. JSON could be involved, and so could XML – as those files should be modern. I say that because one benefit of cloning a job could be making the reporting data available to other consumers.

If we have time someone could explore this.

Why We Need A Scheduling Person

First, real life would require you to integrate a cloned job into Production: Scheduling one job, complete with recovery is one thing. Scheduling cloned jobs is another.

Second, it’s not enough to succeed once in cloning a job: Installations will want to automate splitting again and maybe even dynamically decide how many clones. (And maybe where they’ll run.)

The TWS person will not only do scheduling but figure out how best to structure the JCL. Though not the main thrust of this residency, z/OS 2.1 will have JCL enhancements I guess to be useful here. We’ll have 2.1 on our LPAR so you can play with this.

If you’ve never played with BatchPipes/MVS I expect you’ll get to try it out, too.

Measurements

While we overtly state this is not a formal benchmark, we’ll take lots of measurements and tune accordingly.

This I’m expecting to play the main role in.

Write Up

The idea of this is to deliver practical guidance through real life case studies. So there’ll be a book and maybe a presentation.

We’ll document what we did, what issues arose, how we resolved them, and what we learnt. And this will draw on all our perspectives.

As the application programs aren’t the main deliverable they’ll probably go in appendices. Tweaks we have to make to the code, JCL and schedule will be highlighted. Reporting requirements will also be described.

Finally

I think this will be a lot of fun. I also think the contact with Development will be fruitful.

So I invite you all to consider applying. Nominations close 5 July.

Analysing A WLM Policy – Part 2

(Originally posted 2013-05-01.)

This is the second part, following on from Part 1.

Importance Versus Velocity

After drawing out the hierarchy you have to set actual goals – whether velocity or some form of response time. And you have to set importances.

The importances should now be easy as they flow from the hierarchy. IRLM should be in SYSSTC – which serves as an anchor at the top. It’s not quite as simple, though, as assigning from 1 downwards – perhaps with gaps. You might find there are too many hierarchical steps and you have to decide how to conflate some.

It’s important to understand that Importance trumps Velocity: Importance 1 goals are satisfied first, then 2 and so on.

But a low Velocity service class period with Importance 1 might well have its goal satisfied too easily and WLM will then go on to satisfy less important service class periods rather than trying to overachieve the Imp 1 goal.

Further, a velocity goal that is always overachieved provides no protection on those occasions when resources become constrained: The attainment could well be dragged all the way down to the goal.

At the other extreme an overly aggressive velocity goal can lead to WLM giving up on the goal.

It sounds against the spirit of WLM but I’d set a goal at roughly normal attainment – assuming this level provides acceptable performance,

Actually the same things apply to response time goals: Importance overrides and setting the tightness of the goal right is important.

What’s In A Name?

I’ve seen enough WLM policies now to know they fall into three categories:

IBM Starter Policy derived
Derived from Cheryl Watson’s
Entirely home-grown

I know these because of the names therein. (By the way it’s the third category I see the most problems in.)

The first two contain names which are rhetorically useful such as “STCHI”. It’s better not to either name them something too specific – in case you have to repurpose them – or to encode the goal values in the name – in case you have to adjust them (as you probably will).

By the way the same applies to the descriptions – which appear in SMF 72–3. If I ever learn Serbo-Croat it’ll probably be from SMF. 🙂

The Importance Of Instrumentation

Having just mentioned SMF let me talk about instrumentation.

Recall I was asked to look at a WLM policy.

Initially I was sent a WLM policy print. I then asked for (and swiftly got) appropriate SMF.

The point is it’s both you need:

The policy (in whatever form) gives you the rhetoric.
The SMF gives you the reality of how it performs.

Note the SMF doesn’t give you classification rules but the policy obviously does.

As an aside I’ve posited to Development it would be useful to instrument which classification rules fire with what frequency. Do you agree?

The most obvious use case is figuring out which rules are actually worthwhile, not that that’s a major theme in WLM tuning. I suspect there are others.

I’d like to thank Dougie Lawson and Colin Paice for their help in thinking about certain subsystems they are more conversant with than I am. This whole discussion would’ve been a lot worse without their input.