Monthly Archives: July 2015

You are browsing the site archives by month.

While there are three main reasons organizations typically perform Root Cause Analysis (RCA) following an issue with their asset or equipment, there are a whole host of other indicators that RCA should be performed.Cartoon_Man/HardHat

Odds are, you’re recording a lot of valuable information about the performance of your equipment – information that could reveal opportunities to perform an RCA, find causes, and implement solutions that will solve recurring problems and improve operations. But are you using your recorded information to this extent?

First, let’s quickly talk about three reasons why RCA is typically performed:

1. Because you have to

There may be a regulatory requirement to demonstrate that you are doing something about a problem that’s occurred.

2. You have breached a trigger point

Your own company has identified the triggers for significant incidents that warrant root cause analysis.

3. Because you want to

An opportunity has presented itself to make changes for the better. Or perhaps you’ve decided you simply don’t want to lose so much money all the time.

At the core of all industry is the desire to make money. Anything that negatively impacts this goal is usually attacked by performing root cause analysis.Oil And Gas Pipelines

I was having a conversation with a reliability engineer at an oil and gas site, and I asked him what lost opportunity or downtime might cost that company over the course of a year. He said it was in the vicinity of three quarters of a billion dollars – $750,000,000. Is this a good enough reason to perform root cause analysis? Even a 10% change would have a huge impact on bottom line figures.

The monetary impact to the business was of course not due to any single event, but to a multitude of events both large and small.

Each event presents itself as an opportunity to learn and to make any changes necessary to prevent its reoccurrence. Once can be written off as happenstance… things happen, serious or minor, and that’s life. But to let it happen continuously means that something is seriously wrong.

While these are all valid reasons to perform an RCA, there are at least ten more tell-tale equipment-related clues that an RCA needs to happen – most of which can be identified through the information you’re probably already recording.

Here are ten tell-tale signs that your organisation needs to perform Root Cause Analysis:

  1. Increased downtime to plant, equipment or process.
  2. Increase in recurring failures.
  3. Increase in overtime due to unplanned failures.
  4. Increase in the number of trigger events.
  5. Less availability of equipment.
  6. High level of reactive maintenance.
  7. Lack of time… simply can’t do everything that needs doing.
  8. Increase in the number of serious events… nearing the top of the pyramid.
  9. Longer planned “shut” durations.
  10. More frequent “shut” requirement.

These indicators imply that we need to be doing more in the realm of root cause analysis before these issues snowball.

If you can identify with some of these pain points, download our eBook “11 Problems With Your RCA Process and How to Fix Them” in which we provide best practice advice on using RCA to help eliminate some of these problems.


Author: Ben Rowland

A colleague and I were discussing how his nine year old son had completed his Cub Scouts Cyclist Activity badge. We noticed how some of the bike maintenance tasks that had been identified were, shall we say, less than ‘optimal’.

Now you might say this is a bit unfair to judge a Cub Scout lesson through the eyes of a reliability professional (and you’d be right) but what was interesting is that we often see the same sorts of issues within the industry.

Click image to view larger



The first thing we noticed is the tasks aren’t really tasks, but a list of components; i.e. they tell you what to look at but not what to look for.

In other words, how a task is written is clearly very important.  In the example above “check the back tire” does not help us know what to look for. Is it there? Is it worn? Does it have air in it? Is it damaged? With vague work instructions like these maintainers are left to decide what to inspect for, which will inevitably lead to inconsistent maintenance.

Some of the examples above are better than others, “your helmet fits” for example, is more specific and much better than “check helmet.”

While working with clients to develop their maintenance plans, the RCM process we use ensures that each maintenance task addresses a specific failure mode, or modes. We can run a report that shows this link, which in turn allows the maintainer to understand the purpose of the inspection. The task can also be written in such a way as to focus the maintenance on identifying the potential failure.

Another issue with the tasks above is there isn’t any data or figures included in the task.  How much tire wear is acceptable? What is the minimum tread depth?  What pressure should the tire be at? Is there a minimum and maximum?

There also needs to be instruction as to how frequently to do the bicycle checks.  Every ride? Every month?  Things like checking your wheels are fitted tightly might need to be performed prior to every ride, but checking a chain for wear could be performed every few months. Not having this information can lead to items being under or over maintained, leading to possibly unsafe equipment condition or wasted effort.

“Okay then, you do it!”

Well it’s only fair after criticizing the Cub Scout’s effort that we have a go ourselves. So below is an example of how we might construct a FMEA and maintenance strategy for a bicycle, in the Availability Work Bench™ (AWB) RCM-Cost software¹:

Click image to view larger


We can see that for the failure mode ‘chain worn’ we’ve identified an inspection task to periodically check the chain for wear to address that failure mode. We’ve specified the method to use (a wear gauge, as opposed to a simple visual check or performing a measurement) and an acceptable limit (less than 75% worn).  This is a clear communication of what is required, minimizing the chances of ineffective maintenance.

“How do I choose which task to perform?”

In the example above I touched on the point that there may be a choice of maintenance tasks that could be performed, as well as whether or not to perform any maintenance at all.  The RCM process also helps us to choose an appropriate maintenance task and it is essentially a balance between the severity of the failure vs. the cost or effort to perform the maintenance. Often severity is thought of in terms of cost e.g. lost production, but it also covers the impact on safety or operational impact. The operating context of the equipment also affects the severity. The example below shows how we use the AWB software to select an optimal maintenance task interval.

Click image to view larger

Optimization Curve Image

Imagine we only ride our bike for getting around the town we live in for non-essential tasks, such as popping to the shops to buy some milk and a newspaper. In this case a punctured tire is not critical and we might decide not to carry a spare tube and tools to change it (pump, tire levers etc.) and instead to perform ‘breakdown maintenance’ i.e. walk the bike home and repair it there.  Now if we were instead on a vacation touring a remote location, far from any nearby towns, this ‘run to fail’ strategy would result in a very long walk and clearly not be suitable!

 Hidden Failures

So assuming we were carrying a spare tube, and relying on it in remote locations, what happens if there is a problem with the spare tube? “Did I remember to fix it after my last puncture?” What if there is a manufacturing defect?” Or “what if I didn’t find the thorn that caused the first puncture still stuck in the tire and got a second puncture?” These are called ‘hidden failures’ and require failure finding tasks in order to mitigate them.

 Operator Maintenance

We might also set our bicycle maintenance strategy assuming we do all the checks at home in the garage, but do we also need to consider operating checks?  For our bike this might include using our senses to listen for any abnormal noises, rattles, looseness, creaks or squeaks when riding the bike. We are also checking the operation of the gears and brakes through use, cleaning the bicycle down after use and oiling the chain afterwards to prevent corrosion. This is an example of ‘operator maintenance’.

How do we manage failures during use? If we notice something is wrong during use that we can’t fix, we would note it and arrange some planned maintenance at the bike shop before the warning becomes an actual failure that renders the bike out of action.  For operating failures that occur with little or no warning time we can address these in a number of ways; carrying spares (e.g. a spare inner tube), or tools to repair the failure out in the field (puncture repair kit).  We can also introduce re-designs (sealant in the tire to seal holes as they occur).

So there it is, writing an effective maintenance strategy can be as easy as riding a bike.


¹Availability Workbench™ is authored by Isograph Ltd. ARMS Reliability are authorized global distributors, re-sellers and implementers of the software application.