Category Archives: Root Cause Analysis

While there are three main reasons organizations typically perform Root Cause Analysis (RCA) following an issue with their asset or equipment, there are a whole host of other indicators that RCA should be performed.Cartoon_Man/HardHat

Odds are, you’re recording a lot of valuable information about the performance of your equipment – information that could reveal opportunities to perform an RCA, find causes, and implement solutions that will solve recurring problems and improve operations. But are you using your recorded information to this extent?

First, let’s quickly talk about three reasons why RCA is typically performed:

1. Because you have to

There may be a regulatory requirement to demonstrate that you are doing something about a problem that’s occurred.

2. You have breached a trigger point

Your own company has identified the triggers for significant incidents that warrant root cause analysis.

3. Because you want to

An opportunity has presented itself to make changes for the better. Or perhaps you’ve decided you simply don’t want to lose so much money all the time.

At the core of all industry is the desire to make money. Anything that negatively impacts this goal is usually attacked by performing root cause analysis.Oil And Gas Pipelines

I was having a conversation with a reliability engineer at an oil and gas site, and I asked him what lost opportunity or downtime might cost that company over the course of a year. He said it was in the vicinity of three quarters of a billion dollars – $750,000,000. Is this a good enough reason to perform root cause analysis? Even a 10% change would have a huge impact on bottom line figures.

The monetary impact to the business was of course not due to any single event, but to a multitude of events both large and small.

Each event presents itself as an opportunity to learn and to make any changes necessary to prevent its reoccurrence. Once can be written off as happenstance… things happen, serious or minor, and that’s life. But to let it happen continuously means that something is seriously wrong.

While these are all valid reasons to perform an RCA, there are at least ten more tell-tale equipment-related clues that an RCA needs to happen – most of which can be identified through the information you’re probably already recording.

Here are ten tell-tale signs that your organisation needs to perform Root Cause Analysis:

  1. Increased downtime to plant, equipment or process.
  2. Increase in recurring failures.
  3. Increase in overtime due to unplanned failures.
  4. Increase in the number of trigger events.
  5. Less availability of equipment.
  6. High level of reactive maintenance.
  7. Lack of time… simply can’t do everything that needs doing.
  8. Increase in the number of serious events… nearing the top of the pyramid.
  9. Longer planned “shut” durations.
  10. More frequent “shut” requirement.

These indicators imply that we need to be doing more in the realm of root cause analysis before these issues snowball.

If you can identify with some of these pain points, download our eBook “11 Problems With Your RCA Process and How to Fix Them” in which we provide best practice advice on using RCA to help eliminate some of these problems.

11Problems_Ebook_banner_576x247_72ppi

Author: Kevin Stewart

At some point, most companies will want to see quantifiable metrics showing that their Root Cause Analysis (RCA) program has resulted in a positive return on investment (ROI).

ROI is relatively easy to calculate as a dollar value when it comes to tangibles such as equipment or production time. Things can seem trickier when trying to assign a dollar value to safety improvements resulting from an RCA program. Try to keep it simple.

This formula –

Cost of the Problem x Likely Recurrence / Cost of the Fix = ROI

is a straightforward way to begin quantifying the ROI of your RCA program, including its effects on safety.

Let’s look at how we might calculate these costs.

Cost of the Fix

  • Cost of an RCA investigation (you may need to include the initial training, though this should drop off as it is amoritized out over the program, as well as whatever time, resources, and people are required to conduct the investigation itself).
  • Cost of whatever resources are needed to implement a solution. Don’t forget to include new equipment, parts, additional training, and anything else that is directly attributable to the implementation.

When you eliminate a problem, calculating what you have saved depends a lot on the problem itself and what its rate of reoccurrence is. For instance, if you figure out what was causing a particular machine to fail at a rate of once/year, you won’t see the benefits of your solution for another year. It can take several years and solving many different problems to see the total value of an RCA program.

Improved safety isn’t as impossible to quantify as it might seem. While most companies don’t publicly discuss this type of equation because it can seem insensitive, chances are your company does calculate the monetary cost of an injury or death on the job. These figures may be a bit outdated, but the Mine Safety and Health Administration at the US Department of Labor offers an online calculator, which takes into account both direct costs (like workers’ comp claims) and indirect costs (like training a new worker and lower morale), as one example.

Cost of the Problem Reoccurring

Cost of the initial problem in equipment, production delays, man hours, workers’ comp claims, medical costs, absenteeism, turnover, training new employees, lower productivity, decreased morale, legal fees, increased insurance costs.

At first glance the equation doesn’t quite make sense for a safety “near miss.” If it missed then what did it cost? Is the answer nothing? So the ROI is:  0 x likely recurrence/cost of the fix = 0? The answer obviously must include the potential cost. The cost to the business if the issue was on target and hadn’t missed. It all becomes subjective then. How do you put a cost on maybes?

It might help to look at the statistics of how an incident occurs. Take the cost to the business if a single major accident occurred (every business has this unspoken cost locked away somewhere) and then very simply do the math. One near miss will be worth 0.003 of that cost. Tally up your near misses and now go back to the formula.

AccidentPyramid_V2

As an example, say your data indicates you have 3000 near misses in two years, or 4.1 incidents per day. Then you put a program in place and now you have 3000 near misses in four years, or 2.1 incidents per day. This translates to 3000 fewer near misses in two years time. Per the above calculations, this would generate 3000 x 0.003 or nine fewer major incidents at whatever cost your company assigns to that type of incident. This becomes the savings for your ROI (or the Cost of the Problem in our equation) and can be attributed to the safety program of which the RCA process is a part.

This formula will assist in calculating an ROI on an individual RCA, which is necessary to show that the process is working and providing value so you can justify the program. However, since most safety programs track TRIR (Total Recordable Injury Rate) or something to that effect, you will also need to show that the RCA program affects this, too. This will be difficult because the safety program is in place and doing other things to prevent safety incidents before they happen. How do you attribute a reduction in near misses to preventive programs versus items put in place from an RCA?

You may never be able to separate these items. Even with detailed records, it is not always clear why people do what they do. The best thing you can do is to track when an RCA program was incorporated and then show the improvement in your safety metric, in TRIR, or near misses.

You can use this information to justify the program with the argument that the RCA process is part of the overall safety program and it really doesn’t matter which gets the credit as long as we have continued to drive safety improvements. The RCA program should be a small part of the overall safety program costs since there are usually several full time safety people involved, committee meetings, safety initiatives, programs, etc.

It doesn’t matter how you slice and dice it, the return on investment for your RCA program boils down to: What will it cost me to fix the problem now? – versus – What is the cost if this problem happens again?

Author: Jack Jager

An effective root cause analysis process can improve business outcomes significantly. Why is it then that few organisations have a functioning root cause analysis process in place?

Here are the top 6 sure-fire ways to kill off a Root Cause Analysis program

1. Don’t use it.

stop-hand

The company commits to the training, creates an expectation of use and then doesn’t follow through with commitment, process and resources! Now come on, how easy is it to devalue the training and deliver a message that the training was just to tick someone’s KPI box and that the process doesn’t really need to be used.

2. Don’t support it.

Success in Root Cause Analysis would be the ultimate goal of each and every defect elimination program. To achieve success however, requires a bit more than just training people in how to do it. It requires structures that initially support the training, that mentor and provide feedback on the journey towards application of excellence and thereafter have structures that delineate exactly when an investigation needs to take place and that delivers clear support in terms of time and people to achieve the desired outcome. Without support for the chosen process the expected outcomes are rarely delivered.

3. Don’t implement solutions.

To do all of the work involved in an investigation and then notice that there have been no corrective actions implemented, that the problem has recurred because nothing has changed, has got to be one of the easiest ways to kill off a Root Cause Analysis process. What happens when people get asked to get involved in RCAs or to facilitate them when the history indicates that nothing happens from the efforts expended in this pursuit? “I’m too busy to waste my time on that stuff!”

 

4. Take the easy option and implement soft solutions.

Why are the soft controls implemented instead of the hard controls? Because they are easy and they don’t cost much and we are seen to be doing something about the problem. We have ticked all the boxes. But will this prevent recurrence of the problem? There is certainly no guarantee of this if it is only the soft controls that we implement. We aren’t really serious about problem solving are we, if this is what we continue to do?

5. Continue to blame people.

The easy way out! Find a scapegoat for any problem that you don’t have time to investigate or that you simply can’t be bothered to investigate properly. But will knowing who did it, actually prevent rectraining your staff urrence of the problem?

Ask a different question! How do you control what people do? You control them or more correctly their actions by training them, by putting in the right procedures and protocols, by providing clear guidelines into what they can or can’t do, by creating standard work    instructions for everyone to follow and by clearly establishing what the rules are in the work place that must be adhered to.

What sort of controls are these if we measure them against the hierarchy of controls? They are all administrative controls, deemed to be soft controls that will give you no certainty that the problem will not happen again. We know this! So why do we implement these so readily? Because it is the easy way out! It ticks all the boxes, except the one that says “will these corrective actions prevent recurrence of the problem?”

We all understand the hierarchy of controls but do we actually use it to the extent that we should?

6. We don’t know if we are succeeding because we don’t measure anything.

You get what you measure! When management don’t implement or audit a process for completed RCAs it sends a strong message that there is no interest, or little, in the work that is being done to complete the analysis.

Tracking KPIs like, how many RCAs have been raised against the triggers set? How many actions have been raised in the month as a result and, of those actions raised, how many have been completed? If management is not interested in reviewing these things regularly along with the number of RCAs subsequently closed off in a relevant period, then it won’t be long before people notice that no one is interested in the good work being done.

The additional work done to complete RCAs will not be seen as necessary, as it’s not important enough to review and the work or the effort in doing this will then drop away until it’s no longer done at all.

measuring success

Another interesting point is that if only the number of investigations is reported, and there is no check on the quality of the analysis being completed, then anything can be whipped up as no one is looking! If a random audit is completed on just one of the analyses completed in a month then this implies that the quality of the analysis is important to the organisation.

What message do we send if we don’t measure anything?

 

 

In closing, the first step on the road to implementing an effective and sustainable Root Cause Analysis program is to pinpoint what’s holding it back. These Top 6 sure-fire ways to kill off a Root Cause Analysis program will help you identify your obstacles, and allow you to develop a plan to overcome them.

This question was posed to a discussion group and it got me thinking how do you grade an investigation?
The overall success will be whether the solution actually prevents recurrence of the problem.  One definition of Root Cause Analysis is: “A structured process used to understand the causes of past events for the purpose of preventing recurrence.” So a reasonable assessment of the quality of the analysis would be to determine whether the RCA addressed the problem it set out to fix by ensuring that it never happens again (this may be a lengthy process to prove if the MTBF of the problem is 5 years, or has only happened once).quality-blocks1

Are there some other tangibles that can help you assess the quality of an RCA?  RCAs use some sort of process to accomplish their task. If this is the case then it would stand to reason that there will be some things you can look for in order to gauge the quality of the process followed. While this is no guarantee of a correct analysis, ensuring that due diligence was followed in the process  would lend more credibility to the solutions.

What are some of these criteria by which you can judge an analysis?

  • Are the cause statements ‘binary’? By this we mean unambiguous or explicit. A few words only and precise language use without vague adjectives like “poor” since they can be very subjective.
  • Are the causes void of conjunctions? If they have conjunctions there may be multiple causes in the statement. Words such as: and, if, or, but, because.
  • Is there valid evidence for each cause? If causes don’t have evidence they may not belong in the analysis or worse yet solutions may be tied to them and be ineffective.
  • Does each cause path have a valid reason for stopping that makes sense? It is easy to stop too soon and is sometimes obvious. For example, if a cause of “no PM” has no cause for it so that the branch stops, it would seem that an analyst in most cases would want to know why there was no PM.
  • Does the structure of the chart meet the process being used? If it is a principle-based process then it should be easy to check the causal elements to verify that they satisfy those principles. These might be causal logic checks or space time logic checks or others that were associated with the particular process.
  • Is the chart or analysis completed? Does it have a lot of unfinished branches or questions that need to be answered or action items to complete?

qualifying criteria

  • Is the chart or analysis completed? Does it have a lot of unfinished branches or questions that need to be answered or action items to complete?
  • Are the solutions SMART (Specific, Measurable, Actionable, Relevant, and Timely)? Or do they include words like: investigate, review, analyze, gather, contact, observe, verify, etc.
  • Do the solutions meet a set of criteria against which they can be judged?
  • Do the solutions address specific causes or are they general in nature?  Even though they may be identified against specific causes if they don’t directly address those causes then it may still be a SWAG*.
  • If there is a report, is it well written, short, specific and cover just the basics that an executive would be interested in? Information such as cost, time to implement, when will it be completed, a brief causal description and solutions that will solve the identified problem are the requisites.

These are some of the things that I currently look at when I review the projects submitted by clients. I’d be interested to know about other things that may be added to the list.

* SWAG =  Scientific Wild Ass Guess

training_footer_ad1-resized-600.jpg

By Jack Jager

Defect Elimination articleWhat is “defect elimination” and a “Defect Elimination program”?

“Defect elimination” analyses the defect, and then implements corrective actions to prevent future similar defects.

A “Defect Elimination program” is a structured process companies adopt to become more consistent and reliable in eliminating defects. It forms part of a broader Quality Improvement program.  It’s a systematic approach to apply defect elimination consistently across the operations of a company, for any opportunities that present themselves as worthy of the effort. Read More →

exec summaryYou’ve investigated an incident, and now it’s time to write up your report. This report should document what you’ve found, and the corrective actions needed to prevent recurrence or mitigate the problem to an acceptable level.

At the heart of a good report is a strong, clear executive summary. Read More →

RCA Success

An incident has occurred, and a Root Cause Analysis (RCA) is needed to find an effective solution. How do you ensure that the RCA delivers the best results – that is to say arriving quickly and accurately at the cause or causes of the problem?
At the start of any analysis, there are a number of simple things you can do to boost the likelihood of a successful outcome. These tips are not rocket science; yet they are important to get right. Read More →

If you are investigating an incident using Root Cause Analysis (RCA), what are the critical skills that you should possess as an RCA facilitator?

Here are five key skills that will help to make you a more effective facilitator. Read More →

Having watched one of the best Olympic Games ever this summer in London, with some jaw dropping performances being witnessed, it left me thinking about the reasons behind the athlete’s success. What is it that makes an athlete want to win, what gives them the desire to train every day for a chance of winning an Olympic medal?

Whilst listening to British Cycling’s performance director Dave Brailsford, he discussed the success of the British cycling team. “The whole principle came from the idea that if you broke down everything you could think of that goes into riding a bike, and then improved it by 1%, you will get a significant increase when you put them all together”, he said. According to Dave it was these ‘marginal gains’ that underpinned the team’s success. Read More →

To be, or not to be, that is the question:
Whether ’tis Nobler in the mind to suffer
The Slings and Arrows of outrageous Fortune,
Or to take Arms against a Sea of troubles…

“To be or not to be” is the opening phrase of a soliloquy in William Shakespeare’s play “Hamlet”. It is perhaps the most famous of all literary quotations, but there is deep disagreement on the meaning of both the phrase and the speech. Whilst we won’t be solving that disparity in this article, we will discuss the disagreements amongst the global engineering community as to whether the 5 Whys process is sufficient enough to effectively identify the root causes and ultimately, the solutions, for a particular problem.

Why – Why – Why – Why – Why? Read More →