Monthly Archives: August 2015

You are browsing the site archives by month.

Author: Kevin Stewart

I recently wrote an article about auditing root cause analysis (RCA) investigations, and it only seemed appropriate to follow up with advice on auditing your overall RCA program. Let’s go back to the dictionary definition of “audit” — a methodical examination and review. In my mind this definition has two parts: 1) the methodical examination and 2) the review.

It might help to compare this process to a medical examination. In that case, the doctor would examine the patient, trying to find anything he can, either good or bad. This would include blood work, reflex test, blood pressure, etc. After that examination, he would then review his findings against some standard to help him determine if any action should be taken. Auditing an RCA program is no different; first, we must examine it and find out as much as we can about it, then we will need to review it against some standard or measure.

In my other article, I discussed at length the measures against which an RCA investigation could be judged. Those still apply, and one of the program audit items can and should be the quality of the RCA investigation.

Now we are faced with determining the characteristics of a good program. A list of characteristics is given below:

  • Quality of RCA investigations
  • Trigger Levels are set and adhered to
  • Triggers are reviewed on a regular basis and adjusted as required to drive improvement
  • A program champion has been designated, trained and is functioning
  • Upper management has been trained and provides invloved sponsorship of the program
  • Middle management has been trained and provides involved sponsorship of the program
  • The floor employees have been trained and are involved in the process
  • The solutions are implemented and tracked for completion
  • The RCA effectiveness is tracked via looking for repeat incidents
  • Dedicated investigators / facilitators are in place
    • Investigators are qualified and certified on an ongoing basis
  • All program characteristics are reviewed / defined / agreed to by management and include: An audit system is defined, funded, and adhered to
    • Resource requirements
    • Triggers
    • Training requirements are in place and funded
    • Sponsorship statements and support
  • The RCA program is incorporated into the onboarding and continuous review training for new and existing employees

The next step in developing an audit is to generate a set of items that your program will be gauged against. This list can come from the items above, your own list, or a combination of the two. Once you have a final list of items to audit against, you need to generate a ratings scale. This can be a pass/fail situation or a scale that gives a rating from 0 to 5 for each item. This can allow you to give partial credit for some items that may not quite meet the full standard. You can also provide a weighting scale if deemed appropriate. This would mean that some of the items in the list had more importance or weight in the scoring based on the local feelings or culture of your facility. This scale can be anything you wish, but be cautious about making the scale too large. Can you really tell the difference between a 7 or 8 in a 10-point scale? Perhaps a 1 – 4 scale would be better?

Next, develop a score sheet with each item listed and a place to put a score for each one. It’s handy to add some guidelines with each item to give the reviewer a gauge on how to score the item. A sample of such guidelines might look like:

0    Does not exist

1    Some are in place but not correct

2    Many are in place and some are correct

3    All are in place but only some are correct

4    All are in place and most are correct

5    All are in place and correct

Don’t forget to leave a space for notes from the reviewer to explain the reasons for partial credit. With this in place either next to each item or easily available as a reference, it helps ensure consistency in the scoring, especially if multiple people will be scoring your RCA program.

The goal for a standardized audit process would be that several different people could independently review and score a program and would come up with essentially the same score. This may seem like a simple thing, but it turns out to be the largest issue because everyone interprets the questions slightly differently. There are several things you can do to minimize discrepancies:

  1. Provide the information above to help.
  2. Require the auditors to be trained and certified by the same process/people and then have them provide a sample audit and check it against the standard. Review and adjust any discrepancies until you are sure they will apply the same thinking against the real audit.
  3. Always ensure that if multiple auditors are used in a program review, at least one has significant experience to provide continuity. In other words, don’t allow an audit to be done with all first-time auditors.

With these measures in place, all you have to do is review the RCA program against your list, score it, and have some sort of minimum for passing. Likewise, you’ll want to have some sort of findings report where the auditor can provide improvement opportunities against the individual items instead of simply saying: “did not pass.”

These measures will ensure that the program is gauged against a consistent standard and can be repeated by multiple auditors. There will always be differences if multiple people are auditing an RCA program, but by utilizing the steps above these differences can be minimized to provide the highest level of credibility for the audit.


According to a definition applicable to the insurance industry, an accident is an event which is not deliberately caused and which is not inevitable[1]. A typical insurance policy has a significant number of exclusions that are the “evitable” circumstances.

Logically, any situation which is reasonably evitable and which likely has harmful consequences ought to have been identified. bigstock-work-injury-clai-52aa87fb42b761a2c2c045ffb29402c7c8aa2d0a

Those of us who are the safety leads at our organizations have a lot riding on our shoulders. That pressure gives us a constant incentive to improve, because we can never do our job too well. This post highlights some of the questions we ask ourselves that ultimately ladder up to the larger question, “How can we do better?”

For example:

1. How many injuries have been recorded at your location(s) in the past year? 

The often cited adage “you can’t manage what you don’t measure” is pertinent here.

Data is king; knowing how many injuries have been recorded at all locations for your enterprise will not only enable comparisons between sites and an analysis of the common and different causes, but also can be used to motivate greater improvements at the lesser site(s).

2. Does that number include the near misses? Or aren’t they reported?

The expression “near miss” clearly indicates a close call, but all too frequently it occasions relief rather than analysis. This is because people look on the bright side and put the escape down to good luck. Overcoming this complacency is a challenge. The issue for the organization is that all too often these events are simply not reported, or reported too long after the event to enable an accurate re-construction of the event. This compromises the ability to derive any “lessons learned” that could generate appropriate improvements.

3. Would you know if the near misses hadn’t all been captured? 

The simple fact is that “you don’t know what you don’t know”; this situation calls for a process of acknowledgement, if not reward, so that the incident participants have no fear of punitive measures being applied when they report the circumstances of the near miss. This necessitates the clear communication of a “no blame” philosophy. If employees feel that they will suffer some negative consequence they will be loath to volunteer information about the near-miss incidents.    

 4. Is your record improving?

Unless the data is being promptly collected, accurately recorded, and analyzed, trending will not be possible and improvement not apparent. The objective is to have a demonstrable improvement evidenced by the statistical record. The accuracy of this data will depend not only on the creation of the “no blame” culture but also on the refinement of the methodology and tools employed in the investigation of incidents.

5. Have you set targets for improvement? bigstock-Dartboard-with-d-f4d3b5fe1b60fb34e4354eaa55ca093fa0e3729c

Establishing fresh targets and goals periodically is the only way to ensure the improvement is continuous. Even a site with an almost blemish-free record needs to be totally vigilant about the changes that are being undertaken there. Change is the only constant and, regretfully, is also an opportunity for hazards and harm to arise. The fresh targets ought to be reflected in the key performance indicators (KPIs) applicable to the respective safety roles for your enterprise.

6. Are there any unidentified hazards facing the personnel? 

Only systematic inspection and auditing processes will reveal previously unrecognized hazards. The certainty that you have minimized risks and hazards will grow proportionally as the employees who encounter the hazards demonstrate their ownership of the safety program. They have the ultimate control of the likely causes of their own potential harm. But whether the personnel have accepted ownership of the program or not, it is incumbent on the responsible officer to implement the specific hazard identification process. This will necessitate close engagement with the plant or equipment operators, technicians, or any person with an exposure to their work environment. Yes, that’s everybody.

There are also hazards of the interpersonal type that may never be apparent to the observer; bullying and stress are increasingly the causes of substantial claims for compensation and can only be detected by building a trusting relationship with the personnel and developing confidentiality protocols.

 7. How effectively are you learning the lessons from each “accident”? shutterstock_153987764-667d5faded20b810456293957c3c5a1b303e526f

The parlance “lessons learned” is commonplace but not consistently applied. These are words that express an intention to make improvements in the organization but all too often focus on the actors in the event rather than the systems and processes that are central to the business.

“Human error” is the categorical expression most commonly heard when blame is being attached and represents a plethora of mistakes that humans make. Discovering that precise error in this unique event and the reason(s) for it can add value and lead to preventive measures being implemented — but not in isolation, not as the so-called “root” cause.

Perfect knowledge, perfect understanding, and perfect operation by all humans in the enterprise are a fantasy. Humans are fallible and accidents will happen if the situation exists.


8. Which causes are “evitable”? 

The “evitable” causes are simply the known, designed, or planned components of the situation – the hardware, equipment, systems, and processes that are used in the production of the goods or service in question. These are all possible causes which, with a human interface, can create hazards with potential negative safety consequences. They are the opportunities for establishing controls or installing barriers that prevent harm.

The safety program needs to identify improvements to the systems or equipment, which would at least minimize the likelihood of a repeat occurrence given the fallibility of the human factor. What are the possible failure modes or the mis-operations that could occur?   

9. Can you demonstrate that you have thoroughly and methodically analyzed every event in order to prevent recurrence?

A thorough and methodical causal analysis is not possible without the creation of a cause map. This is best achieved through a mediated process involving the pertinent stakeholders and subject matter experts and identifying and arranging the proven causes in a logical manner. It needs to be both comprehensive and comprehensible to win the confidence of the decision makers who are looking for recommendations that will effectively modify, substitute, or eliminate the causes.

There are regulatory authorities that have expectations in this sphere and will want to see the assiduous application of a method that has proven to be effective regardless of the industry or problem-type.


ARMS Reliability is here to help you answer these questions. Our free eBook “11 Problems With Your RCA Program And How To Fix Them” is a great first step to figuring out “How can we do better?” Download it here.



Mientras que existen tres razones principales por las cuales las organizaciones típicamente ejecutan un Análisis de Causa Raíz a sus activos o equipos, existe una gran cantidad de indicadores adicionales por los que se debe realizar un RCA.Cartoon_Man/HardHat

Lo probable es que usted esté registrando mucha información valiosa acerca del desempeño de su equipo – información que puede revelar oportunidades para llevar a cabo un análisis de causa raíz, encontrar causas, e implementar soluciones que resuelvan problemas recurrentes y mejoren la operación. ¿Pero realmente está usando la información para ese propósito?

Primero, hablemos brevemente de las razones por las que típicamente se ejecuta un análisis de causa raíz:

1. Por obligación

Es probable que exista alguna regulación/requerimiento para registrar que usted está haciendo alguna cosa con respecto al problema ocurrido.

2. Usted alcanzó un límite disparador

Su propia compañía ha identificado disparadores de incidentes mayores que ameritan un análisis de causa raíz.

3. Porque usted lo desea

La oportunidad se ha presentado por si misma para hacer cambios de mejora. O quizás usted ha decidido que es hora de detener la pérdida constante de dinero.

El propósito de una industria es hacer dinero. Cualquier razón que impacte este objetivo es usualmente atacado por el análisis de causa raíz.

Mientras tenía una conversación con un ingeniero de confiabilidad en una facilidad petrolera, le pregunté cuántas oportunidades perdidas y paradas sufre la compañía en un año. Me respondió que alrededor de tres cuartos de billón de dólares, es decir $750.000.000. ¿No es acaso una razón importante para ejecutar un análisis de causa raíz? Tan solo un 10% impacta significativamente los cálculos de la compañía.

El impacto financiero no es consecuencia de un evento individual, sino de una multitud de eventos tanto mayores como menores.

Cada evento representa por si mismo una oportunidad de aprendizaje y de cambio para prevenir su recurrencia. Así es la vida… cosas pasan, pequeñas o grandes. Pero es un error de grandes proporciones el permitir que los eventos sigan ocurriendo de forma continua.

Mientras todo lo anterior son razones válidas para ejecutar un análisis de causa raíz, existen por lo menos 10 pistas reveladoras, relacionadas con equipos, que indican la necesidad de un RCA- muchas de las cuales pueden ser identificadas con la información que probablemente usted ya ha registrado.

A continuación diez signos reveladores, que indican la necesidad de llevar a cabo un Análisis de Causa Raíz:

  1. Aumento en los tiempos inactivos de planta, equipo o proceso.
  2. Aumento de fallas recurrentes.
  3. Aumento de horas extra debido a fallas no planeadas.
  4. Aumento de eventos disparadores.
  5. Menor disponibilidad de equipo.
  6. Alta cantidad de mantenimientos reactivos.
  7. Carencia de tiempo… simplemente no se pueden hacer las tareas necesarias.
  8. Aumento en el número de eventos graves… alcanzando el tope de la pirámide.
  9. “Detenciones” planeadas de mayor duración.
  10. Mayor frecuencia de “detenciones” requeridas.

Lo anterior implica que necesitamos adentrarnos más en el análisis de causa raíz antes que estos problemas crezcan como una bola de nieve.