A Few Thoughts On Parallel Sysplex Test Environments

(Originally posted 2009-11-09.)

There’s a pattern I’ve seen over a number of test Parallel Sysplex environments over the past few years, a couple of them in situations this year:

It’s not much use drawing performance inferences from test environments if they’re not set up properly for performance tests.

Sounds obvious, doesn’t it?

There are two problem areas I want to draw your attention to:

  1. Shared Coupling Facility Images

    If you run a performance test in an environment with shared coupling facility images you stand to get horrendous request response times and the vast majority of requests going async (given a chance). I’ve even seen environments where XCF refuses to use coupling facility structures and routes ALL the traffic over CTCs. (And I’ve seen a couple of environments where there are no CTCs to route it over and XCF traffic is then reduced to a crawl.)
  2. "Short Engine" z/OS Coupled Images

    In a recent customer situation I saw the effect of this: The customer was testing DB2 Loads where actually it was a bunch of SQL inserts. They were also duplexing the LOCK1 structure for the data sharing group. The Coupling Facility setup was perfect, but still response times became really bad once duplexing was established for the LOCK1 structure. Two salient facts: Because of duplexing all the LOCK1 requests were async. XCF list structure request response times were always awful.

    The answer to why this problem occurred lies in understanding how async requests are handled: The coupled z/OS CPU doesn’t spin in the async case. In the "low LPAR weight relative to logical engines online" case the z/OS LPAR’s logical engines were but rarely dispatch on physical engines. This meant there was a substantial delay in z/OS detecting the completion of an async request. Hence the elongated async response times. As I said, the LOCK1 structure went async once it was duplexed.

    As it happens the physical machine wasn’t all that busy: Allowing the LPAR to exceed share – using a soaker job – ensured logical engines remained dispatched on physical engines longer. And, perhaps paradoxically, the async request response times went right down. This, I hope, reassured the customer that in Production (with "longer-engine" coupled z/OS LPARs) async coupling facility response times ought to be OK.

Now, this is just Test. But it could unnecessarily freak people out. But, hopefully, it’s easy to see why Test Parallel Sysplex environments might perform much worse tan Production ones.

(I’m guessing you’re going "duh, I knew Test would be worse than Prod". 🙂 But these two cases are specifics of why Test might be even worse compared to Prod than expected.)

Anyhow, I thought they were interesting. And I have seen 1. quite a few times now. 2. not so much, in fact only once so far.

Published by Martin Packer

.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: