(Originally posted 2012-03-25.)
As previously discussed I’m often in a situation of trying to make sense of a set of job-related SMF data. Even though it may be your own installation’s data, you’re probably confronted with what I like to call “a journey of discovery” occasionally, too.
I’m always looking for what I can discern from the data.1 And, when confronted with a set of data about batch jobs, I go into overdrive. 🙂
This post is about how to tell if a set of batch jobs really are clones of each other. It’s an exercise in pattern definition, albeit loosely.
But first, why would you want to know what’s a clone set of jobs? Remember these are near-identical jobs that run in parallel against subsets of the data. Firstly, if something’s cloned you might be able to clone it further.2 Second, if it isn’t cloned you need to recognise that and think about the effort involved to even start with cloning.3
The process of detecting clones is easy to describe but not so easy to do. Here are the steps:
- Look for similarities in SMF 30 Step- and Job-End records.
- Likewise in SMF 101 DB2 Accounting Trace.
- And similarly for data access.
Steps 2 and 3 could be done in either order. And indeed Step 2 would be only be relevant for DB2.
Let’s think about these in a bit more detail…
Step-End And Job-End Evidence
I would expect cloned jobs to run more-or-less alongside each other – though they might be set off in groups. Of course imbalance between the clones would mean they wouldn’t end at the same time.
Additionally the jobs would have the same “step profile”. By this I mean the number of steps is consistent, the same steps in each job are the big ones. The program names are the same. And the performance profile of each step is similar across the clones, so the CPU intensiveness and the EXCP counts are similar.
I would expect also to see a sensible job-naming convention. For example “all the jobs beginning PLCD50 are clones and the suffix is 00, 01, 02 and so on”. From this you get job names like PLCD5000, PLCD5001 etc.
Generally I spot groups of jobs meeting these criteria pretty easily – using SMF Type 30 subtypes 4 (Step) and 5 (Job).
DB2 Invocation Evidence
For DB2 jobs I’d expect corroboration from DB2 Accounting Trace (SMF Type 101):
Plan names and package names4 should be the same.
In many cases I’ve seen a single DB2 plan name for an entire application, and sometimes crossing application boundaries. Similarly packages are sometimes widely used – for example in the “I/O module” or Stored Procedure cases. Taken together this is a necessary but not sufficient condition.
DB2 Accounting Trace, as you probably know, can give a very detailed breakdown of where a step’s time goes5 – down to the package level. Again, you’d expect to see a similar profile across all the clones.
For any serious DB2 Batch analysis I’d be looking at this data anyway. I’ve written extensively about DB2 Batch, most recently here.
Data Access Evidence
This is where consistency is slightly less to be expected: Most probably DD names will be the same across the cloned jobs. But very often the data set names are slightly different. For example the clone stream number might be encoded in the data set – probably in one of the lower level qualifiers.
For DB2 it’s more difficult to assess which tables a job step access – and probably you need to look at the DB2 Catalog for insight. When you do you may well find the cloned jobs accessing partitions of the same table (in some cases).
There is other evidence of interest here:
In many cases clone jobs (or streams) are preceded by a job whose role is to split the data to feed the clones. Similarly there’s often a follow-on job to merge the results. Detecting these – in the non-DB2 case is usually pretty straightforward. (Even in the DB2 case the scheduler should tell you.) My point here is there’s value in seeing how cloning is working, not least from why there might be imbalance between the clones.
As I said at the outset it’s useful to figure out which jobs in a suite or a window are part of a cloning implementation. And as I hinted in a couple of places there’s also value in understanding balance (or imbalance). In this post I’ve given some tips on the kinds of patterns to look for. Some of this could be codified, I’m sure. In any case the human mind is a wonderful instrument for pattern recognition6.
1 I’ve talked about this sort of thing before. Most recently in Published on Slideshare: I Know What You Did Last Summer.
2 Recall my recommendation to clone 2, 4, 8, 16 … or else 3, 6, 12, 24… – unless you know differently.
3 See this part and this part especially of the ‘I Said "Parallelise" Not "Paralyse"’ series of blog posts for more on this.
4 You only get package-level statistics if you specify Accounting Trace classes 7 and 8.
5 You only get the detailed break down if you specify Accounting Trace classes 1, 2 and 3. (And see 4.)
6 This footnote is a wholly gratuitous reference to the excellent Pattern Recognition, a novel by the excellent William Gibson. 🙂