Yearly Archives: 2015

You are browsing the site archives by year.

In most cases, there is much to gain by working through maintenance strategy optimization. To identify where your company’s maintenance strategy sits on the spectrum, you can perform a simple self-assessment that looks for the most common symptoms, which are described in detail in our guide “5 Symptoms Your Maintenance Strategy Needs Optimizing.” If the symptoms are evident, then there is a strong business case to invest in maintenance strategy optimization. The primary question in diagnosing the health of your maintenance strategy is a simple one. Does your maintenance strategy need optimizing? Ideally, your maintenance strategy is already optimized. Perhaps it was, but is in need of a tune-up. Or, as is the case in many companies, maybe you are experiencing endemic symptoms that lead to: M

  • Recurring problems with equipment.
  • Budget blow-outs from costly fixes to broken equipment.
  • Unplanned downtime that has a flow-on effect on production.
  • Using equipment that is not performing at 100 percent.
  • Risk of safety and environmental incidents.
  • Risk of catastrophic failure and major events.

To identify where your company’s maintenance strategy sits on the spectrum, you can perform a simple self-assessment that looks for the most common symptoms.

  1. Increase in unplanned maintenance – A sure sign that your maintenance strategy is not working is the simple fact that you are performing more unplanned maintenance, which is caused by an increase in the occurrence of breakdowns.
  2.  Rising maintenance costs – In companies that apply best practice maintenance strategy optimization, total maintenance costs are flat or slightly decreasing month-on-month. These optimized strategies combine preventative tasks with various inspection and root cause elimination tasks which in turn produces the lowest cost solution.
  3. Excessive variation in output – A simple definition of the reliability of any process is that it does the same thing every day. In other words, equipment should run at nameplate capacity day in and day out. When it doesn’t, this is an indication that some portion of the maintenance strategy is misaligned and not fully effective.
  4. Strategy sticks to OEM recommendation -Sticking to the maintenance schedule prescribed by Original Equipment Manufacturers (OEMs) may seem like a good starting point for new equipment. But it’s only that a starting point. There are many reasons why you should create your own optimized maintenance strategy soon after implementation.
  5. An inconsistent approach – Consistency implies lack of deviation. And this implies standardisation. When it comes to maintenance strategies, standardization is essential.

For an in depth look at these symptoms download the complete guide “5 Symptoms Your Maintenance Strategy Needs Optimizing” 

If your maintenance activities have a large proportion of reactive repairs then the costs of maintaining your assets are larger than they need to be, because the cost of performing unplanned maintenance is typically three times the cost of performing maintenance in a planned manner. Furthermore, if your system is reactive, it is a sign that you are not managing failures. Your biggest costs may be catastrophic failure, systemic failure or equipment defects.Proactive x Reactive creative sign with clouds as the background

These major meltdowns or one-off events can cost millions of dollars in reactive repairs, lost production and/or major safety/environmental impacts. If you need to lower the cost of maintenance this is an area you can make a significant impact on the P&L.

Proactive maintenance – which is aimed at avoiding such scenarios – is a much more cost-effective approach.

First, what is reactive maintenance? Put simply, it is any maintenance or repair done to a piece of equipment after a failure event. If a gear-box grinds to a halt and your maintenance team rushes to repair it, they are engaging in reactive maintenance.

While the immediate cost of such maintenance may seem low – a day of labor and the purchase of a new part for the machine – the flow-on costs associated with downtime, lost production can be much higher and there is a greater risk of safety and environmental incidents during the shutting down or starting up of equipment.

In companies where reactive maintenance is a large proportion of work performed, there are many hidden costs carried by the business such as higher inventories; premium rates for purchasing spare parts; higher stocking levels for critical spares; more wasted time queuing for tools, materials, and labor; higher overtime levels; more plant downtime; interruption to customer orders; stockouts; offspec quality.  The organization and management system has a short term, busy focus often under budget pressure, variations in production, and lots of “things to do”.


On the flip side, proactive maintenance takes a preventative approach. It involves making assets work more efficiently and effectively so that downtime and unexpected failures become a thing of the past. It’s also about trimming unnecessary expenditure from asset management budgets. From a bottom line perspective, it’s about boosting the assets’ contribution to earnings before interest and tax (EBIT).

Strategies associated with proactive maintenance involve understanding and managing the likelihood of failures, some of the common analytical methods to understand the impact of failures on the business include:

  • System Analysis – to understand the way equipment failures can impact the availability and production capacity of a system; it allows the analyst to identify and eliminate potential bottlenecks in a system, and thus increase plant capacity
  • Criticality Analysis – to rank equipment by the likelihood and severity of failure impact on key business objectives, so you can then channel maintenance resources into the more critical pieces of equipment
  • Maintenance Benefit Analysis – to evaluate a maintenance plan and identify areas where maintenance is either not needed or not optimal.
  • Spares Optimization – to find the optimum level of spares to hold in-stock, which balances the cost of not having spares available versus taking up storage space on-site
  • Repair Vs Replace Analysis – to predict or track the cost of repairs against the cost of replacement, so it becomes clear when to replace assets for best value
  • Root Cause Analysis – to analyze the root cause of failures and focus resources on eliminating their reoccurrence, not just fixing the symptoms time and time again.
  • Vulnerability Analysis- to systematically review all aspect of the operation in a way to discover tomorrow’s failure, so it can be eliminated in a planned fashion.

As these strategies attest, proactive maintenance is about much more than building a schedule of ongoing maintenance tasks. By understanding and managing failure the maintenance resources can be directed to those areas that require attention in a planned manner, you can actually save significant amounts of money into the long term.

And, above all, it is important to remember that a culture of reactive maintenance is not ideal. In fact, unplanned reactive maintenance is one of the key symptoms that your maintenance strategy isn’t working.

Learn more by downloading our guide: 5 Symptoms Your Maintenance Strategy Needs Optimizing

Enquanto há três razões principais pelas quais as organizações normalmente executam Análise Causa Raiz (RCA) após um problema com o seu activo ou equipamento, há toda uma série de outros indicadores que indicam que o RCA deve ser executadas.

As probabilidades são, você está registrando um monte de informações valiosas sobre o desempenho do seu equipamento – informações que possam revelar oportunidades para realizar análise causa raiz, encontrar as causas e implementar soluções que vão resolver problemas recorrentes e melhorar as operações. Mas você está usando suas informações registradas nessa medida?

Primeiro, vamos falar rapidamente sobre três razões pelas quais o análise causa raiz é normalmente realizado. Cartoon_Man/HardHat

  1. Porque você tem que

Pode haver um requisito regulamentar para demonstrar que você está fazendo algo sobre um problema que ocorreu.

  1. Você ultrapassó um ponto gatilho

Sua própria empresa identificou os gatilhos para incidentes importantes que justifiquem a Análise Causa Raiz.

  1. Porque você quer

Uma oportunidade apresentou-se para fazer mudanças para melhorar. Ou talvez você decidiu que você simplesmente não quer perder tanto dinheiro o tempo todo.

No cerne de toda a indústria é o desejo de ganhar dinheiro. Qualquer coisa que afeta negativamente esse objetivo geralmente é atacada através da realização de Análise Causa Raiz.

Eu estava tendo uma conversa com um engenheiro de confiabilidade em um site de petróleo e gás, e eu perguntei-lhe o que oportunidade perdida ou tempo de parada poderia-lhe custar a essa empresa ao longo de um ano. Ele disse que era nas proximidades de três quartos de um bilhão de dólares. $ 750.000.000. Será esta uma razão boa o suficiente para realizar a Análise Causa Raiz? Mesmo uma mudança de 10% teria um enorme impacto sobre números de linha de resultados.

O impacto monetário para o negócio foi, naturalmente, não devido a qualquer evento único, mas a uma infinidade de eventos grandes e pequenos.

Cada evento apresenta-se como uma oportunidade para aprender e para fazer as alterações necessárias para evitar a sua repetição. Uma vez que podem ser considerados como casualidade … as coisas acontecem, graves ou menores, e isso é a vida. Mas, para deixar que isso aconteça de forma contínua significa que algo está seriamente errado.

Embora todos estes são motivos válidos para executar Análise Causa Raiz, existem pelo menos mais dez indícios reveladores relacionados com equipamentos de que um RCA precisa acontecer – a maioria dos quais podem ser identificados através da informação que você provavelmente já está registrando.

Aqui estão dez indícios reveladores de que sua organização necessita realizar Análise Causa Raiz:

  1. Aumento do tempo de parada de planta, equipamento ou processo.
  2. Aumento de falhas recorrentes.
  3. Aumento de horas extras devido a falhas não planejadas.
  4. Aumento do número de eventos gatilho.
  5. Menor disponibilidade de equipamentos.
  6. Alto nível de manutenção reativa.
  7. Falta de tempo … simplesmente não pode fazer tudo o que precisa ser feito.
  8. Aumento do número de eventos graves … que se aproximam ao topo da pirâmide.
  9. Duração das “paradas” maior do planejado.
  10. Requerimento de “paradas” mais frequêntes.

Estes indicadores implicam que nós precisamos de fazer mais no campo do Análise Causa Raiz antes de estas questões actúem como bola de neve.

Can you quantify the financial impact of your maintenance program on your business? Do you take into account not only the direct costs of maintaining equipment, such as labour and parts, but also the costs of not maintaining equipment effectively, such as unplanned downtime, equipment failures and production losses?

The total financial impact of maintenance can be difficult to measure, yet it is a very valuable task to undertake. It is the first step in finding ways to improve profit and loss. In other words, it is the first step towards an optimised maintenance strategy.

In a 2001 study of maintenance costs for six open pit mines in Chile [1], maintenance costs were found to average 44% of mining costs. It’s a significant figure, and it highlights the direct relationship between maintenance and the financial performance of mines. More recently, a 2013 Industry Mining Intelligence and Benchmarking study [2] reported that mining equipment productivity has decreased 18% since 2007; and it fell 5% in 2013 alone. Besides payload, operating time was a key factor.  

So how do you know if you are spending too much or too little on maintenance? Certainly, Industry Benchmarks provide a guide. In manufacturing best practice, benchmarks are less than 10% of the total manufacturing costs, or less than 3% of asset replacement value [3].

While these benchmarks may be useful, a more effective way to answer the question is to look at the symptoms of over- or under-spending in maintenance. After all, benchmarks cannot take into account your unique history and circumstance.

Symptoms of under-spending on maintenance include:

  • Rising ‘hidden failure costs’ due to lost production
  • Safety or environmental risks and events
  • Equipment damage
  • Reputation damage
  • Waiting time for spares
  • Higher spares logistics cost
  • Lower labour utilisation
  • Delays to product shipments
  • Stockpile depletion or stock outs

Other symptoms are explored in more detail in our guide: 5 Symptoms Your Maintenance Strategy Needs Optimizing.

Man in front of computer screen

Figure 1

In most cases, it is these ‘hidden failure costs’ that have the most impact on your bottom line. These costs can be many times higher than the direct cost of maintenance – causing significant and unanticipated business disruption. As such, it is very important to find ways to measure the effects of not spending enough on maintaining equipment.

Various tools and software exist to help simulate the scenarios that can play out when equipment is damaged, fails or, conversely, is proactively maintained. A Failure Modes Effects and Criticality Analysis (FMECA) is a proven methodology for evaluating all the likely failure modes for a piece of equipment, along with the consequences of those failure modes.

Extending the FMECA to Reliability Centred Maintenance (RCM) provides guidance on the optimum choice of maintenance task. Combining RCM with a simulation engine allows rapid feedback on the worth of maintenance and the financial impact of not performing maintenance.

Armed with the information gathered in these analyses, you will gain a clear picture of the optimum costs of maintenance for particular equipment – and can use the data to test different ways to reduce costs. It may be that there are redundant maintenance plans that can be removed; or a maintenance schedule that can become more efficient and effective; or opportunity costs associated with a particular turnaround frequency and duration. Perhaps it is more beneficial to replace equipment rather than continue to maintain it.

It’s all about optimising plant performance for peak production; while minimising the risk of failure for key pieces of equipment. Get it right, and overall business costs will fall.

Want to read on? Download our guide: 5 Symptoms Your Maintenance Strategy Needs Optimizing.


[1] Knights, P.F. and Oyanander, P (2005, Jun) “Best-in-class maintenance benchmarks in Chilean open pit mines”, The CIM Bulletin, p 93

[2] PwC (2013, Dec) “PwC’s Mining Intelligence and Benchmarking, Service Overview”,


Figure 1:  This image shows Isograph’s RCMCostTM software module which is part of their Availability WorkbenchTM. Availability Workbench, Reliability Workbench, FaultTree+, Hazop+ and NAP are registered trademarks of Isograph Software. ARMS Reliability are authorized distributors, trainers and implementors.

Autor: Kevin Stewart

Auditoria es definido por el diccionario como: “una examinación y revisión metódica.” Cuando hablamos de auditar sus investigaciones de Análisis de Causa Raíz (RCA, por sus siglas en Ingles), hablamos justamente de eso – de una examinación y revisión metódica. Esto está más fácil dicho que hecho, especialmente sin alguna medida especifica con la cual comparar. Si establecemos un estándar bajo el cual se puede medir y comparar la calidad del RCA, la auditoria entonces simplemente se convierte en una revisión del RCA comparado con el estándar aceptado y luego se determina que tan bien se sigue dicho estándar. Este artículo se trata de ayudarle a desarrollar un estándar, y además ofrecerle una plantilla gratuita para calificar su actual proceso para ayudarle a empezar con éxito.  RCAInvestigationScoreSheet_SP_Mock-up

¿Se puede tener el peor programa de RCA en el mundo y no alcanzar ningún criterio mencionado, pero tener una solución efectiva que:

  • Prevenga recurrencia,
  • Alcance las metas y objetivos,
  • Este bajo nuestro control, y
  • No causa algún otro problema?

Seguro, y es difícil argumentar en contra del éxito. Dudo que alguien diga: “Aunque esta solución prevenga que el problema recurra, viene de un RCA que no cumple con nuestros rigurosos métricos de alta calidad, entonces no lo podemos emplear.” Este ejemplo es completamente posible, aunque la probabilidad sea diminuta. Si tenemos un conjunto de medidas para comparar nuestro RCA para asegurar que nuestras alcance un estándar de calidad, entonces la probabilidad de que tengas una solución efectiva que provenga de aquel RCA incrementa considerablemente.

¿Entonces, cuales características del RCA con importantes?

Aquí se muestran algunas preguntas por considerar:

(Si necesitas recordar algunos de estos puntos, he incluido el número de página relevante del libro electrónico “RealityCharting: Seven Steps to Effective Problem-Solving and Strategies for Personal Success” por Dean L. Gano.)

  • ¿Las causas pasan la prueba de sustantivo y verbo? (página 83)
  • ¿Las causas tienen demasiadas palabras o descripciones innecesarias?
  • ¿Los elementos causales pasan todas las pruebas de lógica? (página 108)
    • Verificación lógica de tiempo-espacio
      • ¿Las causas de este efecto existen al mismo tiempo?
      • ¿Las causas de este efecto existen en el mismo lugar?
    • Verificación lógica causal
      • Si remueves la causa, ¿el efecto sigue existiendo?Si la respuesta es no, entonces la causa es necesaria para la relación causal y deberá permanecer en el gráfico. Si la respuesta es sí, deberá ser removida o reposicionada.
  • ¿Hay alguna violación a la regla? De ser así, ¿cuáles son y acaso pasan el estándar mínimo? Reglas por incluir:
    • ¿Alguna de las cajas de causas están vacías?
    • ¿Existen causas sin conexión en el grafico?
    • ¿Cada causa ha sido identifica como una acción o una condición?
    • ¿Cada efecto satisface el Segundo principio (causas existen en un continuo infinito, hay una acción y una condición para cada efecto)?
    • ¿Han sido eliminadas todas las conjunciones? Recuerda que “y” es a menudo interpretado como “causado,” que lleva a la mala interpretación y al error. (paginas 67-68)
    • ¿Cada causa tiene la evidencia adecuada que la respalde para justificas su inclusión en el grafico?
    • ¿Cada rama tiene un identificado un alto? Abajo se muestran altos potenciales: (paginas 88-89)
      • Signo de Interrogación – se necesita más información; una Acción es creada.
      • Condición Deseada – no se necesita seguir preguntando el porqué.
      • Falta de Control – algo de lo cual usted o su organización no tienen control, por ejemplo, “las leyes de la física.”
      • Nuevo Efecto Primario – un análisis separado es requerido.
      • Otros Caminos de Causa son Más Productivos – continuar por este camino sería una pérdida de tiempo.
  • ¿La matriz de solución recae en una caso típico?, tal como:

¿Las soluciones han sido comparadas con algún criterio estándar, con un rango estándar, para minimizar la posibilidad de que soluciones favoritas sean elegidas? (página 118-120)

  • ¿Cada solución ha sido asignada a un miembro del equipo y otorgada un plazo para finalizarlo?
  • ¿El grafico cumple con los cuatro principios de la causalidad? (página 36)
    • Las causas y los efectos son la misma cosa.
    • Las causas existen en un continuo infinito.
    • Cada efecto tiene al menos dos causas causes en la forma de acciones y condiciones.
    • Un efecto existe solamente si sus causas existen en el mismo tiempo y espacio.
  • ¿La definición del problema establece un valor financiero claro y significativo que permitirá a la gerencia hacer decisiones y aprobaciones adecuadas?
    • ¿Si el valor financiero no es apropiado (por seguridad o por una fatalidad potencial), entonces, la definición del problema requiere algún otro valor significativo?
  • ¿Todos los elementos de acción han sido resueltos? (Elementos de acción pueden incluir áreas en donde más información es requerida, hay problemas de evidencia, o algún elemento que ha sido incluido manualmente que necesita ser resuelto y borrado).

El próximo paso en desarrollar una auditoria, es generar una lista en la cual su RCA será comparado.

Este listado puede provenir de los elementos de arriba, su propia lista, o alguna combinación de las dos. Una vez tengas una lista de artículos que auditar, necesitaras crear una escala de calificaciones. Esto puede ser una situación de aprobado/reprobado o una escala que califica del 0 al 5 a cada artículo. Esta última le puede permitir dar crédito parcial a algunos elementos que no satisfacen por completo el estándar.

Desarrolla una hoja de calificación para cada artículo y califica cada uno. No olvides dejar espacio para notas para que el auditor pueda explicar sus razones por otorgar crédito parcial. Es útil agregar una directriz para cada artículo, para que el auditor tenga un criterio de como calificarlo. Un ejemplo de una directriz puede verse así:

0 = No existe
1 = Algunas están en su lugar, pero no son correctas
2 = Varias están en su lugar y algunas con correctas
3 = Todas están en su lugar, pero solo algunas son correctas
4 = Todas están en su lugar y la mayoría son correctas
5 = Todas están en su lugar y todas son correctas

Con directrices como éstas fácilmente disponibles como una referencia en su hoja de calificación, ayuda a garantizar la coherencia en el marcador, sobre todo si hay varias personas que estarán anotando en un RCA.

Ahora todo lo que tienes que hacer es revisar su RCA en comparación con su lista, calificarla, y definir algún mínimo de aprobado.

Esto asegurara que cada RCA sea comparado con un estándar consistente que pueda ser repetido por varias personas, a pesar de que siempre vaya a haber diferencia si múltiples personas están auditando los RCAs. Las diferencias pueden ser minimizadas por tener solo una persona llevando acabo la auditoria o calibrando la auditoria, o trayendo al personal junto y calificar varios como grupo tal de que todos los auditores entiendan el matiz de calificar.

Mientras que he proporcionado una lista bastante completa, de lo que se requiere para comprobar,  al momento de auditar un RCA, mi experiencia es que un RCA puede cumplir con todos los requisitos anteriores y todavía tener algunos problemas. El problema mayor es que la lógica puede ser correcta, pero las causas no, por lo que el RCA puede pasar las pruebas pero no será lo que realmente solucione el problema. El hecho es que los seres humanos están involucrados y que cometemos errores. A veces los errores pueden ser causados ​​por los investigadores inexpertos que necesitan más práctica. Otras razones para error son algunos de los filtros que hablamos, por ejemplo, las limitaciones de tiempo, las nociones preconcebidas o prejuicios, problemas de lenguaje, etc. Esto significa que todavía hay un componente que necesita ser revisado por alguien para la integridad general y para las cosas que una computadora no puede buscar. Esta persona puede ser una persona externa del corporativo, un contratista o un recurso interno.

RealityCharting® tiene las herramientas disponibles para el revisor para ayudarle a criticar el análisis tales como comprobar las reglas, el informe de elementos de acción , vista del elemento causal, y lo más importante contiene un tablero de interactivo.

Mientras que existen tres razones principales por las cuales las organizaciones típicamente ejecutan un Análisis de Causa Raíz a sus activos o equipos, existe una gran cantidad de indicadores adicionales por los que se debe realizar un RCA.Cartoon_Man/HardHat

Lo probable es que usted esté registrando mucha información valiosa acerca del desempeño de su equipo – información que puede revelar oportunidades para llevar a cabo un análisis de causa raíz, encontrar causas, e implementar soluciones que resuelvan problemas recurrentes y mejoren la operación. ¿Pero realmente está usando la información para ese propósito?

Primero, hablemos brevemente de las razones por las que típicamente se ejecuta un análisis de causa raíz:

1. Por obligación

Es probable que exista alguna regulación/requerimiento para registrar que usted está haciendo alguna cosa con respecto al problema ocurrido.

2. Usted alcanzó un límite disparador

Su propia compañía ha identificado disparadores de incidentes mayores que ameritan un análisis de causa raíz.

3. Porque usted lo desea

La oportunidad se ha presentado por si misma para hacer cambios de mejora. O quizás usted ha decidido que es hora de detener la pérdida constante de dinero.

El propósito de una industria es hacer dinero. Cualquier razón que impacte este objetivo es usualmente atacado por el análisis de causa raíz.

Mientras tenía una conversación con un ingeniero de confiabilidad en una facilidad petrolera, le pregunté cuántas oportunidades perdidas y paradas sufre la compañía en un año. Me respondió que alrededor de tres cuartos de billón de dólares, es decir $750.000.000. ¿No es acaso una razón importante para ejecutar un análisis de causa raíz? Tan solo un 10% impacta significativamente los cálculos de la compañía.

El impacto financiero no es consecuencia de un evento individual, sino de una multitud de eventos tanto mayores como menores.

Cada evento representa por si mismo una oportunidad de aprendizaje y de cambio para prevenir su recurrencia. Así es la vida… cosas pasan, pequeñas o grandes. Pero es un error de grandes proporciones el permitir que los eventos sigan ocurriendo de forma continua.

Mientras todo lo anterior son razones válidas para ejecutar un análisis de causa raíz, existen por lo menos 10 pistas reveladoras, relacionadas con equipos, que indican la necesidad de un RCA- muchas de las cuales pueden ser identificadas con la información que probablemente usted ya ha registrado.

A continuación diez signos reveladores, que indican la necesidad de llevar a cabo un Análisis de Causa Raíz:

  1. Aumento en los tiempos inactivos de planta, equipo o proceso.
  2. Aumento de fallas recurrentes.
  3. Aumento de horas extra debido a fallas no planeadas.
  4. Aumento de eventos disparadores.
  5. Menor disponibilidad de equipo.
  6. Alta cantidad de mantenimientos reactivos.
  7. Carencia de tiempo… simplemente no se pueden hacer las tareas necesarias.
  8. Aumento en el número de eventos graves… alcanzando el tope de la pirámide.
  9. “Detenciones” planeadas de mayor duración.
  10. Mayor frecuencia de “detenciones” requeridas.

Lo anterior implica que necesitamos adentrarnos más en el análisis de causa raíz antes que estos problemas crezcan como una bola de nieve.

While there are three main reasons organizations typically perform Root Cause Analysis (RCA) following an issue with their asset or equipment, there are a whole host of other indicators that RCA should be performed.Cartoon_Man/HardHat

Odds are, you’re recording a lot of valuable information about the performance of your equipment – information that could reveal opportunities to perform an RCA, find causes, and implement solutions that will solve recurring problems and improve operations. But are you using your recorded information to this extent?

First, let’s quickly talk about three reasons why RCA is typically performed:

1. Because you have to

There may be a regulatory requirement to demonstrate that you are doing something about a problem that’s occurred.

2. You have breached a trigger point

Your own company has identified the triggers for significant incidents that warrant root cause analysis.

3. Because you want to

An opportunity has presented itself to make changes for the better. Or perhaps you’ve decided you simply don’t want to lose so much money all the time.

At the core of all industry is the desire to make money. Anything that negatively impacts this goal is usually attacked by performing root cause analysis.Oil And Gas Pipelines

I was having a conversation with a reliability engineer at an oil and gas site, and I asked him what lost opportunity or downtime might cost that company over the course of a year. He said it was in the vicinity of three quarters of a billion dollars – $750,000,000. Is this a good enough reason to perform root cause analysis? Even a 10% change would have a huge impact on bottom line figures.

The monetary impact to the business was of course not due to any single event, but to a multitude of events both large and small.

Each event presents itself as an opportunity to learn and to make any changes necessary to prevent its reoccurrence. Once can be written off as happenstance… things happen, serious or minor, and that’s life. But to let it happen continuously means that something is seriously wrong.

While these are all valid reasons to perform an RCA, there are at least ten more tell-tale equipment-related clues that an RCA needs to happen – most of which can be identified through the information you’re probably already recording.

Here are ten tell-tale signs that your organisation needs to perform Root Cause Analysis:

  1. Increased downtime to plant, equipment or process.
  2. Increase in recurring failures.
  3. Increase in overtime due to unplanned failures.
  4. Increase in the number of trigger events.
  5. Less availability of equipment.
  6. High level of reactive maintenance.
  7. Lack of time… simply can’t do everything that needs doing.
  8. Increase in the number of serious events… nearing the top of the pyramid.
  9. Longer planned “shut” durations.
  10. More frequent “shut” requirement.

These indicators imply that we need to be doing more in the realm of root cause analysis before these issues snowball.

If you can identify with some of these pain points, download our eBook “11 Problems With Your RCA Process and How to Fix Them” in which we provide best practice advice on using RCA to help eliminate some of these problems.


Author: Ben Rowland

A colleague and I were discussing how his nine year old son had completed his Cub Scouts Cyclist Activity badge. We noticed how some of the bike maintenance tasks that had been identified were, shall we say, less than ‘optimal’.

Now you might say this is a bit unfair to judge a Cub Scout lesson through the eyes of a reliability professional (and you’d be right) but what was interesting is that we often see the same sorts of issues within the industry.

Click image to view larger



The first thing we noticed is the tasks aren’t really tasks, but a list of components; i.e. they tell you what to look at but not what to look for.

In other words, how a task is written is clearly very important.  In the example above “check the back tire” does not help us know what to look for. Is it there? Is it worn? Does it have air in it? Is it damaged? With vague work instructions like these maintainers are left to decide what to inspect for, which will inevitably lead to inconsistent maintenance.

Some of the examples above are better than others, “your helmet fits” for example, is more specific and much better than “check helmet.”

While working with clients to develop their maintenance plans, the RCM process we use ensures that each maintenance task addresses a specific failure mode, or modes. We can run a report that shows this link, which in turn allows the maintainer to understand the purpose of the inspection. The task can also be written in such a way as to focus the maintenance on identifying the potential failure.

Another issue with the tasks above is there isn’t any data or figures included in the task.  How much tire wear is acceptable? What is the minimum tread depth?  What pressure should the tire be at? Is there a minimum and maximum?

There also needs to be instruction as to how frequently to do the bicycle checks.  Every ride? Every month?  Things like checking your wheels are fitted tightly might need to be performed prior to every ride, but checking a chain for wear could be performed every few months. Not having this information can lead to items being under or over maintained, leading to possibly unsafe equipment condition or wasted effort.

“Okay then, you do it!”

Well it’s only fair after criticizing the Cub Scout’s effort that we have a go ourselves. So below is an example of how we might construct a FMEA and maintenance strategy for a bicycle, in the Availability Work Bench™ (AWB) RCM-Cost software¹:

Click image to view larger


We can see that for the failure mode ‘chain worn’ we’ve identified an inspection task to periodically check the chain for wear to address that failure mode. We’ve specified the method to use (a wear gauge, as opposed to a simple visual check or performing a measurement) and an acceptable limit (less than 75% worn).  This is a clear communication of what is required, minimizing the chances of ineffective maintenance.

“How do I choose which task to perform?”

In the example above I touched on the point that there may be a choice of maintenance tasks that could be performed, as well as whether or not to perform any maintenance at all.  The RCM process also helps us to choose an appropriate maintenance task and it is essentially a balance between the severity of the failure vs. the cost or effort to perform the maintenance. Often severity is thought of in terms of cost e.g. lost production, but it also covers the impact on safety or operational impact. The operating context of the equipment also affects the severity. The example below shows how we use the AWB software to select an optimal maintenance task interval.

Click image to view larger

Optimization Curve Image

Imagine we only ride our bike for getting around the town we live in for non-essential tasks, such as popping to the shops to buy some milk and a newspaper. In this case a punctured tire is not critical and we might decide not to carry a spare tube and tools to change it (pump, tire levers etc.) and instead to perform ‘breakdown maintenance’ i.e. walk the bike home and repair it there.  Now if we were instead on a vacation touring a remote location, far from any nearby towns, this ‘run to fail’ strategy would result in a very long walk and clearly not be suitable!

 Hidden Failures

So assuming we were carrying a spare tube, and relying on it in remote locations, what happens if there is a problem with the spare tube? “Did I remember to fix it after my last puncture?” What if there is a manufacturing defect?” Or “what if I didn’t find the thorn that caused the first puncture still stuck in the tire and got a second puncture?” These are called ‘hidden failures’ and require failure finding tasks in order to mitigate them.

 Operator Maintenance

We might also set our bicycle maintenance strategy assuming we do all the checks at home in the garage, but do we also need to consider operating checks?  For our bike this might include using our senses to listen for any abnormal noises, rattles, looseness, creaks or squeaks when riding the bike. We are also checking the operation of the gears and brakes through use, cleaning the bicycle down after use and oiling the chain afterwards to prevent corrosion. This is an example of ‘operator maintenance’.

How do we manage failures during use? If we notice something is wrong during use that we can’t fix, we would note it and arrange some planned maintenance at the bike shop before the warning becomes an actual failure that renders the bike out of action.  For operating failures that occur with little or no warning time we can address these in a number of ways; carrying spares (e.g. a spare inner tube), or tools to repair the failure out in the field (puncture repair kit).  We can also introduce re-designs (sealant in the tire to seal holes as they occur).

So there it is, writing an effective maintenance strategy can be as easy as riding a bike.


¹Availability Workbench™ is authored by Isograph Ltd. ARMS Reliability are authorized global distributors, re-sellers and implementers of the software application.

Author: Ben Rowland

Surely if some is good, more is better? Like many things in life, there can be too much of a good thing when it comes to detail in an RCM study and finding the right balance can be tricky. Too little detail and you may miss things, too much and you could suffer from ‘analysis paralysis!’ B

So how do we know when we’ve ‘drilled down’ far enough to be thorough but not too far?

John Moubray summarised it nicely in his RCM 2 textbook:

“Failure Modes should be defined in enough detail for it to be possible to select a suitable failure management policy” (Moubray, 2007)

So what is a suitable failure management policy? The failure management policy is the approach chosen in order to mitigate the consequences of failure to an acceptable level.

Let’s consider two pumps; one is a large, complex gas compression pump and the other is a small air conditioning pump on a fork lift.

When trying to understand what the ‘suitable failure management policy’ is, it is necessary to take into account the ‘bigger picture’ of the equipment under consideration:


What is the function of the machine? What is its purpose? Understanding this will help to understand the consequences of the failure, which in turn will help define the criticality.


How critical is it if the failure occurs? Criticality is a product of the severity of the consequences of a failure multiplied and the frequency of occurrence.

In the case of large gas compression pump, a failure could result in product not being delivered, costing $1000’s per hour of downtime. Or for the forklift a/c pump it could be returning the forklift to be swapped for another in the fleet.

Repair vs. replace policy

Another aspect to consider is what is the corrective action? Is it feasible/cost effective to stock the spares and perform a repair activity in-situ, or to simply replace with a new unit?

For a large, expensive pump it would be more expensive to replace the entire unit than to replace a worn seal. Whereas for a small a/c pump it would be more cost effective to discard it and replace with a new one.

Hidden failure

Are the failures evident in normal operation, or do they require fault finding to be performed? Can the seals be seen to check for signs of leakage?

Operating context

How accessible is the equipment? Is scaffolding required? Is the plant required to be shut down? Does the equipment need to be partially dismantled e.g. removing guards etc? Is there any redundancy in place? Is the equipment in a remote location, or a challenging environment?

These are just some things to consider when considering what a ‘suitable failure management policy’ might be for your particular piece of equipment.

Back to our pump examples;
For the large gas compression pump, it is expensive to replace, critical if it fails and is accessible for in-situ repair during scheduled shut downs. In this case the FMEA would be far more detailed, including several failure modes, each with its own inspection or planned maintenance tasks, which would combine to form the ‘Failure Management Policy’ for this pump.

Image 1 How much detail

For the small AC pump on a forklift, let’s say it’s inaccessible for inspection, not critical if it fails and would be replaced rather than repaired. Our FMEA might only include a small number of failure modes, such as ‘Seal worn’, ‘Impellor worn’ and ‘Motor burnt out’ and our corresponding ‘Failure Management Policy’ would be ‘No scheduled maintenance’ and the corrective action would be to ‘Replace AC pump’.

Image 2 How much detail

In conclusion, it can be a challenge to know how much detail to go into when performing a FMEA analysis, but the aim is to go into enough detail to determine a suitable failure management policy. Considering the ‘bigger picture’ of the equipment you are analysing will help guide you as to the level of detail required.

alternate realitiesMuchas veces nuestras diferencias pueden ser una fuente de conflicto o confusión, pero en este artículo me gustaría explorar cómo pueden aprovecharse para resolver problemas en lugar de crearlos.

“Todo va a estar bien si tú lo haces a mi manera.” En algún momento todos hemos probablemente dicho o pensado algo como esto. O tal vez usted lo ha oído de alguien más (muy probable que de su pareja). ¿Cuál es el sentimiento base o cual es el problema aquí? Lo que realmente estamos diciendo es: “Si todo el mundo es igual que yo y pensamos de la misma manera, todo va a estar bien.” Desde luego que esto es imposible. La investigación en neurociencia nos dice que no hay dos cerebros que sean exactamente iguales, y para citar un artículo de Scientific American sobre este tema, “… si el aparato que detecta el mundo difiere entre dos individuos, entonces la experiencia consciente de los cerebros conectados a sus sensores, por tanto, no puede ser la misma”.

Los buenos solucionadores de problemas deben ser conscientes de esto para que no caigan en la trampa de suponer que todo el mundo sabe lo mismo, o que todo el mundo interpreta la información de la misma manera. Yo estaba cambiando los canales de televisión un día y vi un espectáculo interesante de gemelos siameses que comparten un solo cuerpo y la mayoría de los órganos, pero tienen cabezas completamente separadas (y por lo tanto sus cerebros). Cuando el entrevistador plantea una pregunta, cada gemelo respondió a su vez con diferentes respuestas. Esto provocó un desacuerdo entre ellos.

Estas dos personas compartían una crianza idéntica, estaban expuestos  a la vida y a los factores ambientales como cualquier humano, y sin embargo, todavía pensaban diferente. Si eso no te convence de que es imposible que dos cerebros distintos puedan compartir la misma perspectiva, entonces, ¡no estoy seguro de que lo hará!

He aquí un ejemplo de cómo dos personas pueden estar teniendo una conversación sobre lo mismo tema y sin embargo no estar hablando de lo mismo en absoluto. Durante un ejercicio común en una de mis clases, comenzó una discusión entusiasta acerca de los como limpiar los peces. Todos, excepto un estudiante brillante parecía estar en la misma página. Todo el mundo tenía la sensación de que esta persona estaba siendo difícil, pero algo dentro de mí recordó que debía obtener más información. Después de un par de preguntas de sondeo descubrimos que no teníamos el mismo punto de vista sobre el tema. Esta persona nunca había estado pescando y no entendía que “limpiar los peces” implicaba destriparlos y prepararlos para el consumo. Ella no podía entender el ejercicio debido a que su punto de vista de la limpieza era lavar y en general limpiar el exterior de algo, por lo que decía: ¿¡que tiene que ver un cuchillo con todo esto!?

Una observación más personal para respaldar este tema es una discusión que tuve con colegas acerca de la “Jerarquía de Controles” (se muestra en la foto a continuación como referencia).

Un colega dijo que el otro debe entender este concepto ya que tiene una formación en ingeniería, y todos los ingenieros sabrían esto. Tuve que informarles que yo también tengo una formación de 30 años en ingeniería, mantenimiento y confiabilidad, pero en realidad nunca había estado expuesto al término tampoco. Así que una vez más, la situación imposible de la perspectiva de cada uno es idéntica vuelve a relucir.


Mientras que hace su Análisis Causa Raíz, debe mantener el tema de la perspectiva en mente. Asegúrese de formular la definición del problema para que cada perspectiva tenga la oportunidad de ser escuchada, y que el problema es un reflejo de todas las perspectivas del equipo. Mientras se hace el Análisis Causa Raíz, talvez algunos no alcen la voz en la reunión, así que como Facilitador es su trabajo hacer que dichas personas externen su perspectiva y asegurarse entonces que sean escuchados.

En mi experiencia, puede tener un impacto significativo en la comprensión de una causa particular para el equipo. Aunque a veces lo que anhelamos es que todo el mundo vea las cosas exactamente como nosotros, teniendo en cuenta las realidades alternas de los demás es clave para construir una imagen más completa de su problema, lo que le permite encontrar la mejor solución.