November 23, 2003

Fishbone that bad boy!

Chuq pointed me at the request from the ni3 blog for information on root cause analysis resources.

First, I'd point at anyting TQM-ish as a good place to start, or if you're into the math, Deming's work in general. TQM Tools has a nice shrt explanation on using the fishbone to determine when a process has gone out of spec. Simply, it's asking "why did this happen", and "so, why did that happen", until you get back to the cause.

Okay, that's the theory--why doesn't work as well in practice?

The sociology just doesn't work, that's why. No matter how many times people are told that they won't get whapped , the first time anyone makes a connection with admitting responsibility, even a small part of the responsibility for the overall event, and getting whapped for the whole mess, game's over.

And this isn't unique to IT organizations. The times I've seen TQM brought into orgnizations, the boys in the carpeted wing are all hot for the quality improvements, and aren't interested in implementing the organizational changes that have to happen to make this useful.

However, since when there's an incident in IT, either all the corporation gets to "share the love" of the fallout, or at least a very interested and highly-placed subset, unless there is a concrete atmosphere of "no fear, learn from the experience and move on", there's going to be a lot more energy expended on CYA than root causes.

It's the same thing with the re-engineering, six-sigma, or any of the other "let's introduce quality into the organization" theories. Any of these could be used as a tool, but without the sociology in place, it's just finding a bigger hammer when what you need is a screwdriver.

Posted by lsefton at November 23, 2003 12:17 PM
Comments

Very keen observations. Right on. I'm going through this process right now with the vendor and no one wants to accept responsibility. They want to fix the symptoms to the point that it is acceptable and move on, avoiding the unpleasantness of understanding what really happened.

I had not thought about this from the TQM perspective before. Nice links. Thanks.

Posted by: Dann Sheridan at November 26, 2003 05:20 AM