(Originally posted 2010-01-24.)
I’m beginning to look at performance data slightly differently these days…
As well as plotting things by Time Of Day (which our tools have done for 25 years) I’m beginning to plot things more directly with load. (Time Of Day is sort of code for With Load but not really – telling a story people can relate to more directly.)
The first instance of this "with load" approach was plotting CPU per Coupling Facility request (and also request Response Time) against request rate. That’s proved invaluable (as you will see in previous blog entries).
The second instance is what I want to talk about now…
I plotted – for a single Service Class Period – velocity against CPU consumed. As it happens I had 200 data points, ranging from almost no CPU to 7 engines or so. CPU consumed is on the x axis and velocity is on the y axis. One further wrinkle: Each day was plotted as a differently coloured set of points (with different marker styles as well), enabling me to compare one day against another.
I’m not going to share the graph with you – as it really would be abusing customer confidence. But suffice it to say it was interesting…
As you go from no workload all the way up to 2n engines the following happens: The velocity starts out low and rapidly rises to well above the goal velocity, staying there until n engines’ worth of CPU. Then it steadily but slowly declines to well below the goal velocity. At the highest levels of utilisation the velocity achieved is about 20% of the goal. These highest levels of utilisation, though, appear to be "on the curve" – albeit outliers in x value terms. I think that’s an interesting dynamic that says at some point the achievement drops off and can’t be sustained at or near the goal level.
The second thing that I noticed was that the points get more widely distributed as utilisation increases – most notably around the point where the velocity starts to drop. It’s a most beautiful broadening out. So we get into a position of unstable velocity. Again not a good thing.
Finally, let’s consider the days themselves. It turns out they’re all pretty much alike, with two exceptions: All the "2n engine" outliers are from one day – a problem day. Also, on the part of the curve where the velocity is dropping away the "problem day" data points are spread both above and below the others. Again we’re getting instability of outcome.
I really wish I could share this prototype chart with you – it’s got truly beautiful structure. I’m going to "hand create" such a chart a few more times with different customers’ data and then "shrink wrap" it into my analysis code. If you get to see it I think you’ll like it. It could rapidly grow to be my new favourite rhetorical device. 🙂
Of course the above only works for Velocity-based Service Class Periods but I’m sure I could dream up obvious analogue for the other goal types. (PI might be the unifying concept but it doesn’t, in my view, pass the "keep it real" test, not that Velocity is that connected to delivered reality anyway.)
And I share it with you in case it’s something you’d like to experiment with.