Mainframe, Performance, Topics

Back From Vacation And Raring To Go – To Poughkeepsie

(Originally posted 2011-08-11.)

Usually when I go away on holiday I bring something back with me. Often in the form of fresh ideas. This year it’s been such a hectic one that all I did was to flake out. So no new ideas this time. Perhaps that’s a good thing, perhaps not. 🙂

But I think I did achieve something: Mental decluttering. So I can, for example, look at stuff I was working on with a fresh take.

And the timing is actually pretty good for that: On Sunday I fly to Poughkeepsie to begin a four-week residency. We’ll be writing a Redbook on "Batch Modernization", following on from Batch Modernization on z/OS, which people seem to have rather liked. 🙂

I actually don’t know who will be on the team: I expect a mixture of the previous team (any of whom I’d be glad to be working with again) and new people (pleased to meet you). 🙂

I also don’t know what we’re going to write about. So I really do start with a "clean mind". 🙂 Or at least, I hope, an open one.

I think some of what I’ve talked about in recent posts could be useful – if not immediately reused – in the Redbook. I’ve also a few bits and pieces in my mind. But it’s a team effort to define shape and content. (And one of my ideas is that we look at each other’s stuff more this time around – to provide a different perspective.)

Now, I think there are two things that concern you, dear reader :

A question: Are there topics in the Batch Modernisation realm you’d particularly like to see covered? No promises, but I am interested…
A hope: I’d like to think I could take some extracts from what we’re writing and post them here. Again, no promises as the rest of the team might not be happy with these "teasers".

I should point out that it’s not MY residency: My friend Alex Louwe-Kooijmans is running it. And, further, it’s a team effort. But I’m raring to go, recharged from holiday, and hoping to share what I can with you.

(And actually I did experiment with one thing while away: Programming Mac OS X – both Applescript and with Objective C. It’s the first time I’ve really had the chance to get to know my Macbook Pro.)

Another Neat Piece Of Algebra – Series Summation

(Originally posted 2011-07-19.)

Here’s another neat piece of algebra: A technique for summing series.

You know what b – a + c – b + d – c is. Right?

Suppose I were to write the same sum as:

(b – a) +

(c – b) +

(d – c)

The answer is still d – a. Right?

Now, suppose I re-label with s₀ = a, s₁ = b, s₂ = c and s₃ = d. We end up with:

(s₁ – s₀) +

(s₂ – s₁) +

(s₃ – s₂)

This is actually pretty scalable terminology as you can write s_r for any arbitrary value of r. And that’s one of the strengths of algebra: generalisation.

So let’s do that up to n:

(s₁ – s₀) +

(s₂ – s₁) +

…

(s_n – s_n-1)

which is, of course, s_n – s₀.

But what has that got to do with summing series?

If we can replace each (s_r – s_r-1) by a single term u_r you may see the relevance…

u₁ + u₂ + … + u_n = s_n – s₀.

The series summation boils down to a "simple" subtraction. The trick is to find these s terms, given the u terms. Let’s try it with an example.

Summing The Integers

This is the series 1, 2, 3, … , n.

The r’th u term is just r. u_r = r. So we now have to find the s_r term. Remember s_r – s_r-1 has to equal r.

Try s_r = r (r + 1).

Then s_r-1 = (r – 1) r or r (r – 1).

So s_r – s_r-1 = [(r + 1) – (r – 1)] r or 2r. Not quite what we wanted. But we know – dividing by the factor of 2 – we should’ve guessed s_r = ½ r (r + 1).

So s_n – s₀ = ½ n (n + 1) – ½ 0 (0 + 1) = ½ n (n + 1) – 0 = ½ n (n + 1).

The sum of the first n integers being ½ n (n + 1) is a well-known result. Admittedly it could’ve been done another way. But it’s simple enough to show the method.

Another Example – Summing The Squares Of The Integers

This is the series 1, 4, 9, … , n² .

In this case we need to do something that will appear slightly perverse:

Rewrite r² as r ( r + 1) – r.

If you can split each of the terms in a series into two terms you can sum these sub terms. I just did the split. We already know how to sum the "r" portion. It’s ½ n (n + 1). So we need to sum the r (r+1) portion and subtract ½ n (n+1) from the result.

Try s_r = r (r + 1) (r + 2).

Again we need to find s_r-1.

It’s:

(r – 1) r (r + 1)

or, rearranging,

r (r+1) (r-1)

s_r – s_r-1 = [(r + 2) – (r – 1)] r (r + 1) or 3r (r + 1).

This is 3 times what we want so we should’ve guessed s_r = 1/3 r (r + 1) (r + 2).

So this portion of the sum is 1/3 n (n + 1) (n + 2) – 1/3 0 (0 + 1)(0 + 2) or 1/3 n (n + 1) (n + 2).

But we need to subtract ½ n (n + 1) from this:

1/3 n (n + 1) (n +2) – ½ n (n + 1) = 2/6 n (n + 1) (n +2) – 3/6 n (n + 1)

1/6 n (n + 1) [ 2 (n + 2) – 3] = 1/6 n (n + 1)( 2 n + 1) .

If you try it for a few values you’ll see it’s right. This isn’t such a well known result as for the sum of the integers.

I’m conscious there’s been some fiddliness here – which is where I normally fall down. 😦

But I think the "sum a series by converting it to a single subtraction" trick is a neat one – which is why I share it with you.

An Experiment With Job Naming Conventions

(Originally posted 2011-07-19.)

It may surprise you to know I hate asking questions to which I already know the answers. And I hate even more "leaving understanding on the table". Let me put it more positively: I love it when I can glean new insights into existing data. This post is about precisely that: An experiment in gleaning extra understanding…

In Batch Architecture, Part Zero and follow-on posts I talked about gleaning how an installation’s batch applications fit together. I’ll admit that part of it was a little sketchy and I’ve had the opportunity since then to look at a number of customer batch environments. I really don’t much like the part where I ask the customer "what’s your batch naming convention"? So I wrote some experimental code and tested it with one of these recent sets of data…

My raw data in this is SMF 30 Job-End records, processed into a database in my usual way. (And you, too, could do the same – and everything else that’s in this post.)

Remember I’m looking for patterns in 8-character tokens, and about 100,000 of them. The latter may be an under- or an over-estimate for you. The former is fixed. (And this technique might work with other bounded-size tokens such as DB2 Accounting Trace Correlation IDs or CICS region names.)

Here’s the process my code follows:

Discern some masks from a pass over the data. (More about this towards the end of the post – but it is the first step.)
Apply these masks to all the jobs and see which masks fit. (I’ll tackle this first as it explains why we need to do Step 1.)

Do These Jobs Match This Mask?

In this post a mask is a string of characters (for example "AAA999AA") against which each job name is tested. The "A" denotes "any alphabetic character in this position" and the "9" denotes "any numeric character in this position". So, in this example, a match would be a job name with the first three characters alphabetic, the next three numeric and the final two alphabetic.

(It’s perfectly reasonable to complicate things by allowing more than just "A" and "9". Perhaps "$" for non-alphanumeric and "*" or "?" as wildcards. I really don’t think that level of sophistication is necessary for this prototype – and Regular Expressions are probably overkill*.)

Because I knew the test data I used the espoused naming convention for the customer: "AAA999AA" is indeed the mask for this. My code shows that 86% of all batch jobs match this naming convention. So what about the other 14%? 🙂 Maybe that’s a metric: percent_jobs_matching_espoused_naming_convention.

I could’ve stopped there but I thought it useful to analyse the three-character "AAA" piece of the mask: There were 35 different values. Sorting these by occurrence descending I see 11 with over 100 occurrences (the top one having 732). These could be suites (or applications, if you prefer). This I’d be happy to share with a customer. It would enable the conversation to start somewhere more useful than "what is your naming convention?"

But, you’ll note, that’s one mask ("AAA999AA") that was already handed to me. Nice but not enough. I still think this "leaves understanding on the table".

How Do I Generate The Masks?

As I said, I think I can teach my code to do better than that. In fact I think I did…

With 8-character masks where each mask position can be in one of two states ("A" or "9") there are 256 potential masks (and that’s probably only 128 as I think the first position will have to be "A" – not that I’ve coded with that assumption). The point is there isn’t much potential for an explosion.

I glean the masks the following way. I run through all the job names, one character at a time:

If the character present in, say, more than 90% of the job names is a letter I add "A" to any (partial) masks already generated.
If the character is more than 90% of the time a number I add "9" to any partial masks.
If not I create two sets of masks – one with an "A" on the end and one with the "9" on the end.

In this test I generated four masks: "AAA999AA", "AAAA99AA", "AAA9A9AA" and "AAAAA9AA". All the masks start with "AAA" and end with "9AA". The doubt is in the middle where "99", "A9", "9A" and "AA" got generated.

If I drop the threshold from 90% to 80% I only get "AAA999AA" so maybe that is a good naming convention after all. (In fact the middle characters are 87% and 88% numeric, respectively. And the sixth character is numeric 91% of the time – so it scraped through.)

As I said, my initial testing of the mask-matching used "AAA999AA" because the customer had indicated that was their convention. So my code allows you to specify masks and then adds the automatically-generated ones to it.

Conclusion

I think the experiment worked well. I can see cases where the code needs enhancing. I can see cases where it mightn’t be perfect. But I do think this code worth running (and tweaking) at the beginning of every relevant engagement.

* I’m doing my programming in REXX – which doesn’t even have regular expressions. It might be nice to write a function package that did it. A challenge for someone? Anyone? 🙂

Multiline Message Sifting With DFSORT

(Originally posted 2011-07-17.)

Frank Yaeger of DFSORT Development suggested I pass this tip along to y’all. It’s his solution to a problem set by Brian Peterson of UnitedHealth Group…

In z/OS Release 12 two new messages were introduced: IEF032I and IEF033I replace IEF374I and IEF376I. The older messages were single-line step- and job-end messages. The new ones are their multiple-line analogues: IEF032I is 3 lines and IEF033I is 2 lines.

The problem is how to sift these messages out in a program.

Here’s Frank’s solution:

//S1 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD DSN=...  input file (FBA/133)
//SORTOUT DD DSN=...  output file (FBA/133)
//SYSIN DD *
  OPTION COPY                                                  
  INREC IFTHEN=(WHEN=GROUP,BEGIN=(2,7,CH,EQ,C'IEF032I'),             <1>       
    RECORDS=3,PUSH=(134:ID=1)),                                
   IFTHEN=(WHEN=GROUP,BEGIN=(2,7,CH,EQ,C'IEF033I'),                  <2>            
    RECORDS=2,PUSH=(134:ID=1))                                 
  OUTFIL INCLUDE=(134,1,CH,NE,C' '),BUILD=(1,133)              
/

It uses DFSORT’s IFTHEN WHEN=GROUP to form groups. Overall the trick is to assign a group number to those records that are part of the message (placing it in position 134) and leave the records which aren’t with a blank in position 134.

The OUTFIL statement keeps only those records without a blank in position 134, truncating the output to 133 bytes.

So, how does it distinguish between those records that are part of an IEF032I or IEF033I message? The INREC statement uses a pair of IFTHEN clauses in a short "pipeline":

IFTHEN clause <1> begins a group when it encounters "IEF032I" in position 2 (after the ASA control character). For three records beginning with this one a group number is assigned (in position 134). It’s 1 byte long – specified with "PUSH=(134:ID=1)". It doesn’t matter what the group number is, so long as it’s there for the OUTFIL statement.
IFTHEN clause <2> does the same for the "IEF033I" message. This time it’s two records beginning with the "IEF033I" line.

Looking back I see I used the term "pipeline" for IFTHEN in Unknown Unknowns in 2009. This example incorporates a two-stage pipeline: Any records not satisfying the BEGIN= clause in <1> (the first stage) are passed to the second stage (marked <2>). But no records that satisfy the clause are passed on. (If we wanted them to we could code "HIT=NEXT" – but that wouldn’t be useful here.)

The code I’ve shown above is just what Brian needed – and I haven’t altered it from what Frank sent me.

I think there are three fairly obvious twists to this:

If you wanted to you could decode the multi-line message – perhaps with PARSE.
If you wanted to support the old messages (IEF374I and IEF376I) with more IFTHEN clauses. You might do this if had multiple LPARs – not all at or above z/OS R.12 – to support. (Or maybe you’re a software vendor.)
You could support multiple messages – not just the two versions mentioned here – and you might use OUTFIL to route them to multiple destinations. (I think you’d need another flag for that.)

But Frank’s example is a very good one and I’m glad he sent it to me. This is a sort of "guest post" though I did all the writing. I wonder if there are topics you would like Frank to actually "guest post" on. I can put them to him.

Hello, I’m Martin And I’m An Algebraic :-)

(Originally posted 2011-07-09.)

If you’re sat next to me on a plane you’ll probably notice at take off and landing I do algebra puzzles. You may not have heard of the term "algebra puzzles" before and perhaps think the juxtaposition of the two words to be odd, but I think it apt…

(You may also think this whole post to be showing off, but that’s a risk I take in sharing a passion I have.) 🙂

A classic problem with take offs and landings is what to do given you’re not allowed to use electronic equipment. I’ll readily agree that staring out the window is a good one – which is why I prefer a window seat. I love staring out the window. I love maps – and to me looking out of an airplane window brings maps to life. And figuring out what I’m seeing is another great puzzle. But sometimes there’s nothing to see. So what do you do?

I started by taking puzzle books with me. I’ve done Sudoku (but not recently), Kakuro, Futoshiki, Hashi, Kenken and any number of others. I enjoy them but each one lacks variety. (And I’m disappointed that by far and away the most common puzzle books are Sudoku.)

But I find the best puzzles of all are algebra problems. I still have a copy of my "high school" Further Mathematics textbook. I don’t know why, I just do. 🙂

I actually think it’s the elegance of expression and the neatness of the right shortcut that appeal to me. As I’ve said many times I’m a sucker for ingenuity. Below is an example of a neat shortcut that I’d like to share with you. I hope you’ll see what I mean.

One of the nice things about mathematics in general is that you’re perpetually "standing on the shoulders of giants". Some of them well known (Newton, Leibnitz, Euclid, Gauss, etc) but many are anonymous. In the example below I’ve no idea who thought of the shortcut first. (I’m just pleased I understand it and can see its applicability.)

A Simple Example Of Elegance

Problem: Solve (x – 3)² – (x + 2)² = 0

It looks like a difficult puzzle to solve. Of course if it were I wouldn’t be offering it here. 🙂 You could multiply everything out and gather terms but that’s horrid. Thankfully, there is a more elegant way:

Observe x² – y² = (x – y) (x + y) . (Check it if you don’t believe me!)

If you substitute a for (x – 3) and b for (x + 2) you get:

a² – b² which, of course, can be rewritten as (a – b) (a + b) .

I think you’ll agree working out what (a – b) and (a + b) are is easy:

a – b = (x – 3) – (x + 2) or -5

a + b = (x – 3) + (x + 2) or 2x – 1

Multiply them together and you get:

-5 × (2x – 1) = 5 – 10x

So (x – 3)² – (x + 2)² = 5 – 10x which = 0, as the original problem stated.

If 5 – 10x = 0 then 10x = 5 and so x = ½.

See, that wasn’t so hard, was it? I think people think mathematics is hard. I don’t think algebra is hard. I do thing topology is hard – because of the abstractness of the concepts. I do think proving things is hard – because of the need to not miss any loose ends and to know whether you’ve actually proved anything. But algebra is, to me, pure puzzle solving. And elegance is important: In the above example I could quite easily have made a mistake if I’d not known the trick. With the trick I’m much less likely to.

Now someone will probably come along and point out a few things about the example, including a further trick. If they do I’ll be delighted. This "old dog" loves learning new tricks. 🙂 And if I am sat next to you on the plane at least I won’t be muttering to myself as I manipulate those symbols. 🙂

10,000 Hours Doing WHAT?

(Originally posted 2011-07-07.)

It’s a popular suggestion that what separates the truly exceptional person from the rest of us is 10,000 hours of "practice". In book form I’ve seen it twice – in Matthew Syed’s "Bounce" and in Malcolm Gladwell’s "Outliers". Actually, to be fair, Matthew acknowledges his original source so that’s actually only one distinct source. (Life Lesson aside: Trace ideas and "facts" back to see if they came from one place or are truly corroborated.)

The suggestion is that there’s no such thing as innate talent and that all that matters is practice – 10,000 hours of it. And this is often repeated now in public folklore.

I find this assertion problematic for a number of reasons – though I’m prepared to admit I could be wrong.

For a start I find it very hard to believe we all respond the same way to our experiences, that we have the same physiology and brain "wiring".

Secondly – and this is where the title of the post comes in – 10,000 hours of what? Actually the suggestion is that it is useful practice. The issue for me is that the usefulness of the practice is highly variable – depending on who we are, how motivated we are and how tired we are (and maybe many other variables besides).

Thirdly, there is what I call the categorisation problem: Take the example of someone who is a generalist in, say, Information Technology. They could easily gain 10,000 hours of experience, perhaps good experience, spread across their whole domain. Does this make them an expert? If they spent the whole 10,000 hours in a narrower area what kind of an expert does that make them? And if they spent two chunks of 5,000 hours in two areas are they not an expert in either? What if those two areas were abutting? What if they weren’t?

That previous paragraph had a lot of questions in it. Some are easy to answer, some less so. Maybe – and here I hope is the relevance of this whole "10,000 hours" idea – these sorts of questions allow one to "evaluate" one’s career. I put "evaluate" in quotes because I don’t mean this as a scorecard: There are very many valid paths through life. But, for example, discovering your 10,000 hours have been spent scattered across a wide range of topics might tell you you’re a generalist. But then you probably knew that. 🙂

A more difficult case is what to do when you discover you’ve spent 10,000 hours in the same area. With a low boredom threshold like mine there’s a premium on convincing yourself there’s been some diversity. 🙂 But has there really? Or the converse: 10,000 hours concentrated in one area, but is it really one area?

In my case – and this post isn’t really about me – I’ve convinced myself 🙂 of three things: That there’s plenty of variety, that the technology keeps evolving at a dizzying pace, and that my role has in any case morphed over time. Actually, I think a lot of us feel that way.

But does all this change keep resetting the counter to zero hours? I’d maintain it didn’t: The way I learn and (I think) the way I incorporate situations into my experience base is accretive (adding on around the edges of what I already know). So I don’t really think the counter ever resets: We just start new counters occasionally. But I wouldn’t count some of the new technology I dabble in as "start at zero" – despite how much it sounds like babbling. 🙂

I enjoy being in the babble phase. But, like most people, I worry about the quality of the work I produce in that phase. I particularly enjoy when I spot the experience beginning to build. That "10,000 hour"idea just might be motivational. And now for a gratuitous Metallica lyric. 🙂

"Trust I seek and I find in you
Every day for us something new
Open mind for a different view
and nothing else matters"

In the spirit of "10,000 hours" don’t you think that’s a great verse? Anyhow, keep logging those hours. Who’s to say "it’s all been a waste of time"? 🙂

| Minor edit 27 October to fix "whose" that should’ve been "who’s". Dunno how that crept in there. 😦

Recent Experiences With Hardware Data Compression

(Originally posted 2011-07-05.)

As you probably know Hardware Data Compression has been supported by MVS and IBM mainframes for around 20 years. In several recent batch studies I’ve conducted it’s been evident in a widespread way.

(In this post I’m not talking about DB2 compression of either flavour or VSAM compression – though some of the information here applies to these functions as well.)

It’s not as simple a deployment strategy as "turn it on for everything" though I think "turn it on for all sequential data sets large enough to stripe" may well be the flavour. There’s a certain logic to this: If striping can make sequential data access go faster then compressing the data will make it go even fasterer.

There’s a lot of truth in that. But consider one thing: Compression and (decompression) takes CPU cycles. In most software licencing schemes that translates into an increased software bill. For small-scale use that might not matter much. But used wholesale it could make a significant difference.

There’s another thing: If you cause a job that’s running in a CPU-constrained environment to burn more CPU through Compression the speed up may be disappointing. I’m not saying it’s worse than useless – I’ve no evidence of that – but it is a consideration.

My suspicion that the CPU cost can be quite high comes from seeing "low CPU" cases like DFSORT COPY operations and other simple reformatters burning far more CPU than I would have expected. (Quite often this includes a significant chunk of SRB time but not always. This is consistent with the way DFSMS/MVS does (de)compression.) There is no direct metric for the CPU cost of Compression: You have to use the overall CPU consumption and draw your own conclusions.

It sounds like I’m against using Hardware Data Compression. Actually I’m not: I’ve seen evidence of very good compression ratios (for example in SMF 14 and 15 records for non-VSAM data sets). And I know that in many of the cases I’ve seen the alternative might have been to write the data to tape, with all the handling that entails. Which brings us on to something else:

Quite a few of the scenarios I’ve seen have been one-writer-one-reader cases. You tend to think "Pipes" in those cases. You were expecting that, weren’t you. 🙂 but, seriously, the advantages are fairly obvious. One of the objections to Pipes has always been that it takes more CPU than writing data to disk or tape. But if you compress it* it’s not nearly so obvious that’s the case. It would be interesting to conduct an experiment.

This is pretty much an "it depends" situation. Aren’t they all? Actually, there’s quite possibly more speed advantage in getting the buffering and number of stripes right – as described in SG24-2557 Parallel Sysplex Batch Performance. (This book has lots of good advice on access method tuning – written by my residency teammates in 1995.)

In short, if you’re implementing Hardware Data Compression measure the impact and effectiveness and be open to the idea there may be other ways of achieving the speed up.

Finally, I said this wasn’t VSAM but note that SMF 64 also has a Compressed Data Statistics Section, just like SMF 14 and 15. My expectation is, however, that the access pattern to compressed VSAM is not sequential. (For QSAM and BSAM I expect it to be largely sequential.)

* Tape compression is a different matter – with the hard work being done in the control unit.

Another (Perhaps Obvious) Reason For Avoiding Unnecessary Sorts

(Originally posted 2011-07-02.)

Following on from The Best Sort Is The One You Don’t Do here’s another reason for eliminating sorts. I think it’s worth a post in its own right.

(In this post, again, I’m talking about resequencing passes over data – not copying or merging.)

With a sort it’s possible the last record read in might be the first record written out. So you can never overlap input and output phases. (There might even be a phase between the end of the input phase and the beginning of the output phase – particularly if there are intermediate merges.)

A common pattern in a job is a processing step, then a sort step, then another processing step, and so on. Often the first processing step writes a data set the sort reads and the second processing step reads the sorted version of that. (It’s a separate question whether either processing step could have been performed by the sort, of course.) Let’s call the first processing step "W", the sort "S" and the second processing step "R".

Now consider what happens with BatchPipes/MVS. With Pipes you overlap the reader and the writer. That’s part of the benefit (along with I/O time reductions and, perhaps, the elimination of tape mounts). In the above scenario you can overlap W with the input phase of S – with a pipe. Likewise you can overlap the output phase of S with R. What you can’t do is overlap W with R – because you can’t overlap the two phases of S.

A pity but perhaps not a huge one. Let’s illustrate this with some numbers. Suppose:

W runs for 10 minutes, writing all the while.
S has an input phase of 5 minutes and an output phase of 5 minutes.
R runs for 10 minutes, reading all the while.

The three steps between them take 10 + 5 + 5 +10 = 30 minutes.

Overlapping W with S’ input phase saves 5 minutes (as S’ input phase gets stretched to 10 minutes). Similarly overlapping S’ output phase with R saves 5 minutes. So we save 10 minutes overall.* We like to say "the sort gets done for free".

If we can eliminate the sort completely we can overlap S and W completely with a total saving of 20 minutes – 10 for the overlap and 10 for the sort removal. (Without Pipes we’d still see a 10 minute reduction just by removing the sort.)

So, this is another case where removing a sort is a valuable thing to achieve. But as I said in the referenced post that’s easier said than done.

* In this calculation you’ll notice I’ve assumed the I/O time reduction is zero and that there are no tape mounts eliminated. Generally there is some I/O reduction, of course. But it’s still a fair comparison. It’s just the benefits of Pipes are understated.

The Best Sort Is The One You Don’t Do

(Originally posted 2011-07-02.)

Have you ever had the suspicion a sort was unnecessary in your batch? I bet you have.

In recent Batch Performance studies I’ve had the suspicion that many of the sorts are unnecessary: Either they should be merges or not done at all. But how do you prove it?

But first, what do I mean by a sort not needing to be done at all?

Clearly if the data is reformatted then something has to be done to it – just maybe not a sort. It could be a copy or maybe a merge. So I’m really talking about the need to reorder the records. Reordering them is more expensive in terms of disk space (for sort work data sets), CPU and run time. So a sort is best avoided.

So how do you figure out if a sort needs doing? One way is to remove the sort and see what happens. Probably best not done in Production. 🙂 That’s a form of "destructive testing". But there is another way: Essentially running the test I’m about to outline in the time frame of the sort.

In a moment "choreography" but first the mechanics:

Testing If Data Is Already Sorted

For a DFSORT MERGE operation to be successful all the data sets being merged must already be sorted on the merge key(s). Otherwise you get a return code of 16 and a ICE068A message such as:

ICE068A 0 OUT OF SEQUENCE SORTIN01

We can use this to our advantage by attempting to merge the supposedly-already-sorted data set with a dummy file. (DD DUMMY will do just fine.) If the data set is already sorted the step will complete with a 0 return code. Otherwise, as I say, it will be RC=16.

It’s worth noting that the merge will fail immediately it detects a record out of sequence. This means a badly unsorted data set will fail fast. However a largely well sorted data set may not fail the test until it has been almost completely read: Perhaps the last record is the only one out of sequence.

How To Incorporate This Test

If the input data set is persistent you could either add the test step in just before the sort or at the end of the job. If it’s transient (perhaps temporary) it has to be tested before it goes away. In short you run the test when it’s still there but not otherwise updated.

One important aspect is the intrusiveness of the test: Breaking a BatchPipes/MVS pipe to do it is a particularly bad idea. Holding up processing for tape mounts is probably bad. And maybe running the test alongside the actual sort slows it down.

If you were tuning your batch window and suspected a sort was unnecessary you might allow the test to run a few dozen times to gain confidence that it’s not needed. But you might never be sufficiently confident.

One other thing: You might decide to always run the test and run the sort only if the test step ends with RC=16. But then a downstream step would have to access the right data set – presumably the SORTOUT from the test step or the SORTOUT from the sort, depending.

Concluding Thoughts

At the end of the day it’s better to know your batch well enough to avoid having to rely on tests like this. But in reality people don’t understand their batch that well, in my experience. Not a criticism, just an observation. And a situation caused by the complexity and longevity of the typical batchscape.

Maybe this technique is best used during development or in a test environment. As I said on Twitter yesterday I was thinking about ways of "injecting dye into the water" in a data flow sort of way. Maybe I’ll think of some additional "dye test" or "smoke test" (if you prefer) techniques. One other wrinkle might be to inject record sequence numbers into the data to see whether they’re preserved.

It Depends: SMF 30 Job Timings And CPU Time

(Originally posted 2011-06-29.)

This may be stating the obvious – but I wonder to whom it actually is obvious…

I’ve been doing quite a lot of work with batch job timings and CPU recently. (Everything I’m about to say is equally true of steps.) It’s interesting to think about the effects of faster engines versus more engines (a question I haven’t been asked recently) and whether a customer needs more capacity or just faster engines (a question that has come into play).

There’s nothing very new about this but it is worth thinking about. And in particular what we can glean from SMF 30 Step- and Job-end records…

We know lots of things about a job’s timing and related stuff, most notably:

When it started and ended, and hence the elapsed time.
Where it ran.
When it was read in and initiator delays.
How much CPU it used – whether TCB or SRB.
Whether it used a zIIP or zAAP or was eligible to but didn’t.
How many disk or tape EXCPs it did (and how many tape mounts).
Step condition codes.

But one thing we can’t know is whether a job met significant CPU queuing or not. At least not from Type 30. We might be able to infer something from DB2 Accounting Trace (SMF 101) but (to rehearse previous arguments):

We can’t know if the Unaccounted For time (the most likely bucket) really is queuing time. (Generally it is but there are lots of edge cases where it’s something else.)
We can encounter difficulties in tying up the 101 with the 30 – particularly for IMS.
We have to take into account CP Query Parallelism.
The job might not be DB2 at all.

Given we can’t readily establish the amount of time a job queued for CPU for it becomes difficult to establish whether more capacity would help the job. But there is some hope: "I have no idea" is probably not the best we can do:

The job will be part of a WLM Service Class and so will – if the Service Class period has a velocity goal – have Delay For CPU samples. This is extremely broad brush but can tell you if a high CPU job is likely to suffer from queuing.
If the EXCP count is high we can infer that a big chunk of the job’s time is for I/O.
Variability of run time for similar CPU times and EXCP counts suggests the job sometimes gets held up for something.

So, suppose we kept capacity the same and, say, doubled the engine speed. Consider the case of a job that was 30% CPU. And ignore the n-way queuing effects of going from 2n engines to n. Then we might hazard the CPU time would halve and the elapsed time go down therefore by 15%.

Suppose we eliminated queuing (by and large). We can’t really say what the effect would be – except in our example job’s case it’s some fraction of the 70% that isn’t CPU. With the EXCP count we can "hand wave" a number, but it’s precisely that: a hand wave.

Some of the above is part of why I don’t like to give elapsed time speed up estimates. And why I’m not overly keen on answering the "faster versus more engines" question for batch. It’s actually worse than I’ve stated because individual job speed is hard to factor into the overall window’s outcome when migrating to a newer processor. (This same problem applies no matter which speed up you apply to individual jobs and steps, of course.)

But the "unknowable CPU queuing" fact plays into how to interpret other facts like (as in our example) the job is "only" 30% CPU. We don’t know whether it would’ve been 100% CPU without queuing or 30%. (Probably somewhere in between.) But we can use EXCP count, as I said, to help us guess.

For what it’s worth I rarely see jobs much above 30% or much below 10% CPU. I’d say it was a bell curve around 20%. If true, this means most of the leverage is either in I/O time or CPU queuing. Though as a contrary data point I hear quite frequently of customers upgrading to faster engines and seeing their overall batch gets faster.

Welcome to my world of "it depends".:-)