(Originally posted 2012-02-19.)
I have enormous trouble pronouncing "parallelise" right – and not saying "paralyse". It’s true, and I bet many of you have the same trouble (sober or not). It’s on a par with "red lorry yellow lorry" or "the Leith Police dismisseth us". 🙂
But it’s a word I think we’re going to have to get used to pronouncing right. And this post will explain why.
This looks to me like a 4-part series of blog posts on increasing Batch Parallelism. (It started off looking like one but the way it turned into four is perhaps material for another post.)
So, why will parallelising batch become increasingly important? There are really three main reasons:
- Increased "Window Challenge"
- Resilience
- Taking Advantage Of Capacity
There is some overlap between these but I think they’re sufficiently distinct to draw out separately – which is what the rest of this blog post does.
Increased "Window Challenge"
From a business perspective this is the big one. I’m seeing a number of business trends that are leading to one inexorable conclusion: The delivered growth in speed of "single actors" (batch jobs) will be outstripped – over time – by the need. In other words, you can’t long-term just buy yourself out of trouble, whether we’re talking about processor speed, disk subsystem or tape speed, transmission line speed, or anything else for that matter.
It’s true this varies by installation, and even between applications (or suites) or business lines in an individual organisation. But this is the general pattern. It’s also a fact that the pressure comes in waves – because of the nature of the underlying business requirements.
Amongst the business drivers I’ve seen:
- Business volume increases.
Hopefully these are driven by success.
- Mergers and acquisitions.
Typically I’m seeing the same application having to cope with more data as one or other party’s application is adopted.
A similar trend is "standardisation of procedures" where existing lines of business come together to use a single application
- More processing.
In the merger scenario above I’ve seen cases where taking two companies’ data and passing it through the "ongoing" applications means these applications have to be modified (with generally greater pathlength). And decommissioning the "offgoing" applications is another complicating factor.
External pressures such as regulation often lead to more work per unit of business volume.
Modern techniques such as Analytics get injected.
And of course our old friend "just because" i.e. processing grows for all sorts of reasons.
- Shortened Window.
Much has been said about running batch and online concurrently. But shortening the batch window itself remains important for a number of reasons, amongst which are:
- Even if you overlap everything there are still only 24 hours in the day.
In other words the work still has to get done in the cycle, whatever that cycle may be.
- Running online and batch together increases the aggregate resource requirement.
- Batch jobs taking locks (or causing database I/O) can still interfere with transactions.
- Batch and online concurrency is still a difficult feat to achieve.
- There are often deadlines within the batch and sometimes these get tightened up.
- Even if you overlap everything there are still only 24 hours in the day.
Resilience
With a single-threaded job stream just one broken application data record can hold up the whole thing. Or the loss of an LPAR or DB2 subsystem or VSAM file.
Partitioned data can mean an increase in resilience. For example:
- If the data were processed by geographic region (and you had, say, 5 regions) the damage of a broken record is limited to that region.
This, of course, depends on region-level separation. And, naturally, any failure is unwanted – but the business impact could be much reduced.
- If the LPAR were to fail in a correctly-set-up multi-image environment, again the impact could be limited.
There’s a lot to this one. For example, retained locks by a DB2 datasharing member could limit the benefit.
Taking Advantage Of Capacity
Businesses have tended to size machines by online day requirements and there remains the view that generally it is online that’s the peak use of resources. My experience is that about half of installations have batch as the real CPU driver (but probably not the memory driver) and more than half have a bigger I/O bandwidth challenge overnight than during the day.
Where the online day is still the main resource driver an increase in parallelism can usefully absorb the spare capacity overnight.
Where Next?
I contemplate this being a four-part series of blog posts. This part has concentrated on business drivers, almost to the exclusion of technology. The other three posts I expect to be, in order:
- Classification.
- Issues.
- Implementation.
The titles and scope might change a little bit as I flesh them out. I’ll leave you in suspense 🙂 as to what "Classification" might be.
4 thoughts on “I Said “Parallelise” Not “Paralyse” Part 1 – Motivation”