(Originally posted 2011-05-04.)
Up until now I haven’t talked much about DB2, except perhaps to note it’s a little different. But what is a DB2 Batch job anyway? It’s important to note a DB2 job ISN’T necessarily exclusively DB2 – although some are. It’s just a job that has some DB2 in it.
The reason for writing a separate post, apart from breaking things up a little, is because batch jobs with DB2 in them present particular challenges. But also additional opportunities. In general these jobs can be treated like others but with extra considerations.
The main challenge is determining which data the job accesses – and how it accesses it. Let’s break this up into two stages:
- Identifying which DB2 plans and packages are accessed by which job / step.
- Identifying which DB2 tables and other objects are used by these plans and packages. And perhaps how.
Identifying DB2 Plans and Packages
This piece is relatively straightforward: DB2 Accounting Trace -with trace classes 7 and 8 enabled – will give you the packages used. You need to associate the Accounting Trace (SMF 101) record with its job / step.
For most DB2 attachment types the Correlation ID is the same as the job name. (Identifying the step name and number is a matter of timestamp comparison with the SMF30 records – which my code learned to do long ago.)
For IMS it’s more complicated, with the Correlation ID being the PSB name.
(A byproduct of this step might be discovering which jobs use a particular DB2 Collection or Plan name. Sometimes these are closely related to the application itself.)
Identifying Used Objects
This piece is much harder, particularly for Dynamic SQL. Fortunately most DB2 batch uses Static SQL. Even so it’s still tough: If you have the package names you can use the DB2 Package Dependency table in the Catalog to figure out which tables and views the package uses. At least in principle: There’s no guarantee these dependencies will get exercised – as there’s no guarantee the statements using them will ever get executed.
Another problem with this is figuring out whether the access is read-only or for-update.
To totally figure out which statements are executed (and which objects they update and read) would require much deeper analysis – probably involving Performance Trace and extracting SQL statement text from the Catalog.
Conclusion
So this is very different from the non-DB2 case. But at least we can glean what data a DB2 batch job OUGHT to be interested in. And, by aggregation, it’s not hard to work out what data an entire batch application uses.
In this post I wanted to show how DB2 complicates things but that it’s not hopeless. In fact there’s a substantial silver lining to the cloud: Without examining the (possibly missing) source code you can look inside the job at the embedded SQL, if you’re prepared to extract them from the DB2 Catalog.
You’ll notice I’ve said very little in this set of posts about Performance. This is deliberate: Although much of the instrumentation I’ve described is primarily used for Performance these posts have been about Architecture. Which is, I think, a different perspective.
I expect I’ll return to this theme at some point. For now I’ll just note it’s been fun thinking about familiar stuff in a slightly different way.
By the way this post was written using the remarkably accurate WritePad app on the iPad. It’s grown better at recognising my scrawl in the few hours I’ve used it – or perhaps it’s me that’s getting trained. 🙂