Author Archives: Admin

Author: David Wilbur, CEO – Vetergy Group

To begin we must draw the distinction between error and failure. Error describes something that is not correct or a mistake; operationally this would be a wrong decision or action. Failure is the lack of success; operationally this is a measurable output where objectives were not met. Failures audit our operational performance, unfortunately quite often with catastrophic consequences; irredeemable financial impact, loss of equipment, irreversible environmental impact or loss of life. Failure occurs when an unrecognized and uninterrupted error becomes an incident that disrupts operations. bigstock-Worker-in-factor-1108477eac4c3d0b3c37f374ad197440e9c5b429

Individual-Centered Approach

The traditional approach to achieving reliable human performance centers on individuals and the elimination of error and waste. Human error is the basis of study with the belief that in order to prevent failures we must eliminate human error or the potential for it. Systems are designed to create predictability and reliability through skills training, equipment design, automation, supervision and process controls.

The fundamental assumptions are that people are erratic and unpredictable, that highly trained and experienced operators do not make mistakes and that tightly coupled complex systems with prescribed operations will keep performance within acceptable tolerances to eliminate error and create safety and viability.

This approach can only produce a limited return on investment. As a result, many organizations experience a plateau in performance and seek enhanced methods to improve and close gaps in performance.

An Alternative Philosophy

Error is embraced rather than evaded; sources of error are minimized and programs focus on recognition of error in order to disturb the pathway of error to becoming failure. 

Slight exception notwithstanding, we must understand people do not set out to cause failure, rather their desire is to succeed. People are a component of an integrated, multi-dimensional operating framework. In fact, human beings are the spring of resiliency in operations. Operators have an irreplaceable capacity to recognize and correct for error and adapt to changes in operating conditions, design variances and unanticipated circumstances.

In this approach, human error is accepted as ubiquitous and cannot be categorically eliminated through engineering, automation or process controls. Error is embraced as a system product rather than an obstacle; sources of error are minimized and programs focus on recognition of error in order to disturb its pathway to becoming failure. System complexity does not assure safety. While system safety components mitigate risk, as systems become more complex, error becomes obscure and difficult to recognize and manage.

Concentrating on individuals creates a culture of protectionism and blame, which worsens the obscurity of error. A better philosophy distributes accountability for variance and promotes a culture of transparency, problem solving and improvement. Leading this shift can only begin at the organizational level through leadership and example.

The Operational Juncture™

In contrast to the individual-centered view, a better approach to creating Operational Resilience is formed around the smallest unit of Human Factors Analysis called the Operational Juncture™. The Operational Juncture describes the concurrence of people given a task to operate tools and equipment guided by conflicting objectives within an operational setting including physical, technological, and regulatory pressures provided with information where choices are made that lead to outcomes, both desirable and undesirable.

It is within this multidimensional concurrence we can influence the reliability of human performance. Understanding this concurrence directs us away from blaming individuals and towards determining why the system responded the way it did in order to modify the structure. Starting at this juncture, we can preemptively design operational systems and reactively probe causes of failure. We view a holistic assignment of accountability fixing away from merely the actions of individuals towards all of the components that make up the Operational Juncture. This is not a wholesale change in the way safety systems function, but an enhanced viewpoint that captures deeper, more meaningful and more effective ways to generate profitable and safe operations.

A practical approach to analyzing human factors in designing and evaluating performance creates both reliability and resilience. Reliability is achieved by exposing system weaknesses and vulnerabilities that can be corrected to enhance reliability in future and adjacent operations. Resilience emerges when we expose and correct deep organizational philosophy and behaviors.

Resilience is born in the organizational culture where individuals feel supported and regarded. Teams operate with deep ownership of organizational values, recognize and respect the tension between productivity and protection, and seek to make right choices. Communication occurs with trust and transparency. Leadership respects and gives careful attention to insight and observation from all levels of the organization. In this culture, people will self-assess, teams will synergize and cooperate to develop new and creative solutions when unanticipated circumstances arise. Individuals will hold each other accountable.

Safety within Operational Resilience is something an organization does, not something that is created or attained. A successful program will deliver a top-down institutionalization of culture that produces a bottom-up emergence of resilience.


These days, many enterprise-level organizations are likely to have similar operations in multiple locations regionally or even worldwide. When a piece of equipment fails or a safety incident occurs at one site, the company investigates the problem and identifies solutions or corrective actions. Naturally, the team wants to capture the lessons learned and share them with other sites that have similar equipment, processes and potential incidents. investigation files.jpg

Advanced tools like the RealityCharting® software enable teams to share results of an Apollo Root Cause Analysis (RCA) across multiple layers of stakeholders. However, a large multinational enterprise might have dozens of different investigations going on at any given time. At the highest levels, decision-makers don’t necessarily want to see granular information about specific causes at any given plant. They need a top-down perspective of problems and patterns that are affecting the entire organization.

At ARMS Reliability, many of our clients have expressed a similar need. Our solution? Using classification tags to create and apply a consistent taxonomy to all root cause analyses performed for a given organization. Rolled up into a composite report, these tags reveal enterprise-wide trends and issues, allowing management to create action plans for tackling these systemic issues. For example, classification tags might uncover a large number of problems related to a lack of preventative maintenance on a certain type of pump, or a systemic non-compliance with a required safety process.

A classification taxonomy can be scalable and configured to an organization’s goals and processes. Think of these classifications like buckets that can be applied at any level of the RCA — e.g., to the root causes or solutions, to individual contributing causes, or simply to the RCA investigation in general.

Keep in mind: The Apollo Root Cause Analysis method is centered around a free-thinking approach to solving problems. That’s what makes the methodology so powerful — it doesn’t lead you down any generic predetermined pathways by asking leading questions or categorizing various causes or effects in any way. At ARMS Reliability, we advocate applying classification tags only after the root cause analysis investigation is completed, so you keep the free-thinking causal analysis and organize it later, for the purpose of rolling the findings up into a deeper systemic view.

Taxonomies can range from 5–20 categories into the hundreds. For example, here we’ve used a human factors taxonomy to tag causes as organizational influences and other people-centric issues.

screenshot 1.png

 (Click to enlarge)

Reports can provide a summary of how many causes were classified under the various tags:

screenshot 2v2.jpg

 (Click to enlarge)

In another example, an organization bases its taxonomy of reliability issues on the ISO 14224 – Collection and exchange of reliability and maintenance data for equipment.


screenshot 3v3.jpg

 (Click to enlarge)

The taxonomy options are endless. Most organizations we work with have their own unique systems of classifications. It’s really all about codifying the types of information your organization most needs to capture.

If adding classifications to your Root Cause Analyses would be useful for your organization, contact ARMS Reliability. We’d be glad to show you more about what we’re doing with other clients and help you develop a taxonomy that works best for your needs.

Author: Dan DeGrendel

Regardless of industry or discipline, we can probably all agree that routine maintenance — sometimes referred to as preventative, predictive, or even scheduled maintenance — is a good thing. Unfortunately, through the years I’ve found that most companies don’t have the robust strategies they need.

Typical issues and the kinds of trouble they can create:

service engineer worker at industrial compressor refrigeration s1. Lack of structure and schedule

In many cases, routine tasks are just entries on a to-do list of work that needs to be performed — with nothing within the work pack to drive compliance. In particular, a list of tasks beginning with “Check” which have no guidance of an acceptable limit can have limited value. The result can be a “tick and flick” style routine maintenance program that fails to identify impending failure warning conditions.

2. Similar assets, similar duty, different strategies

Oftentimes, maintenance views each piece of equipment as a standalone object, with its own unique maintenance strategy. As a result, one organization could have dozens of maintenance strategies to manage, eating up time and resources. In extreme cases, this can lead to similar assets having completely different recorded failure mechanisms and routine tasks, worded differently, grouped differently and structured differently within the CMMS.

3. Operational focus 

Operations might be reluctant to take equipment out of service for maintenance, so they delay or even cancel the appropriate scheduled maintenance. At times this decision is driven by the thought that the repair activity is the same in a planned or reactive manner. But experience tells us that without maintenance, the risk is even longer downtime and more expensive repairs when something fails.

4. Reactive routines

Sometimes, when an organization has been burned in the past by a preventable failure, they overcompensate by performing maintenance tasks more often than necessary. The problem is, the team might be wasting time doing unnecessary work — worse still it might even increase the likelihood of future problems, simply because unnecessary intrusive maintenance can increase the risk of failure.

5. Over-reliance on past experience 

There’s no substitute for direct experience and expertise. But when tasks and frequencies are too solely based on opinions and “what we’ve always done” — rather than sound assumptions — maintenance teams can run into trouble through either over or under maintaining. Without documented assumptions, business decisions are based on little more than a hunch. “Doing what we’ve always done” might not be the right approach for the current equipment, with the current duty, in the current business environment (and it certainly makes future review difficult).

6. Failure to address infrequent but high consequence failures 

Naturally, routine tasks account for the most common failure modes. They should however also address failures that happen less frequently, but may have a significant impact on the business. Developing a maintenance plan which addresses both types, prevents unnecessary risk. For example, a bearing may be set up on a lubrication schedule, but if there’s no plan to detect performance degradations due to a lubrication deficiency, misalignment, material defect, etc then undetected high consequence failures can occur.

7. Inadequate task instructions

Developing maintenance guidelines and best practices takes time and effort. Yet, all too often, the maintenance organization fails to capture all that hard-won knowledge by creating clear, detailed instructions. Instead, they fall back on the maintenance person’s knowledge — only to lose it when a person leaves the team. Over time, incomplete instructions can lead to poorly executed, “bandaid-style” tasks that get worse as the months go by.

8. Assuming new equipment will operate without failure for a period of time

There’s a unique situation that often occurs when new equipment is brought online. Maintenance teams assume they have to operate the new equipment first to see how it fails before they can identify and create the appropriate maintenance tasks. It’s easy to overlook the fact that they likely have similar equipment with similar points of failure. Their data from related equipment provides a basic foundation for constructing effective routine maintenance.

9. Missing opportunity to improve

If completed tasks aren’t reviewed regularly to gather feedback on instructions, tools needed, spare parts needed, and frequency; the maintenance process never gets better. The quality or effectiveness of the tasks then degrade over time and, with it, so does the equipment.

10. Doing what we can and not what we should 

Too often, maintenance teams decide which tasks to perform based on their present skill sets — rather than equipment requirements. Technical competency gaps can be addressed with a training plan and/or new hires, as necessary, but the tasks should be driven by what the equipment needs.

Without a robust routine maintenance plan, you’re nearly always in reactive mode — conducting ad-hoc maintenance that takes more time, uses more resources, and could incur more downtime than simply taking care of things more proactively. What’s worse, it’s a vicious cycle. The more time maintenance personnel spend fighting fires, the more their morale, productivity, and budget erodes. The less effective routine work that is performed, the more equipment uptime and business profitability suffer.  At a certain point, it takes a herculean effort simply to regain stability and prevent further performance declines.

Here’s the good news: An optimized maintenance strategy, constructed with the right structure is simpler and easier to sustain. By fine-tuning your approach, you make sure your team is executing the right number and type of maintenance tasks, at the right intervals, in the right way, using an appropriate amount of resources and spare parts. And with a framework for continuous improvement, you can ultimately drive towards higher reliability, availability and more efficient use of your production equipment.

Want to learn more? Check out our next blog in this series, Plans Can Always Be Improved:  Top 5 Reasons to Optimize Your Maintenance Strategy.


Author: Dan DeGrendel

Maintenance optimization doesn’t have to be time-consuming or difficult. Really it doesn’t. Yet many organizations simply can’t get their maintenance teams out of a reactive “firefighting mode” so they can focus on improving their overall maintenance strategy. Development And Growth

Stepping back to evaluate and optimize does take time and resources, which is why some organizations struggle to justify the project. They lack the data and/or the framework to demonstrate the real, concrete business value that can be gained.

And even when organizations do start to work on optimization, sometimes their efforts stall when priorities shift, results are not immediate and the overall objectives fade from sight.

If any of these challenges sound familiar, there are some very convincing reasons to forge ahead with maintenance optimization:

1. You can make sure every maintenance task adds value to the business

Through the optimization process, you can eliminate redundant and unnecessary maintenance activities, and make sure your team is focused on what’s really important. You’ll outline the proper maintenance tasks, schedules and personnel assignments; then incorporate everything into the overall equipment utilization schedule and departmental plans to help drive compliance. Over time, an optimized maintenance strategy will save time and resources — including reducing the hidden costs of insufficient maintenance (production downtime, scrap product, risks to personnel or equipment and expediting and warehousing of spare parts, etc.).

2. You’ll be able to plan better

Through the optimization process, you’ll be allocating resources to various tasks and scheduling them throughout the year. This gives you the ability to forecast resource needs, by trade, along with spare parts and outside services. It also helps you create plans for training and personnel development based on concrete needs.

3. You’ll have a solid framework for a realistic maintenance budget

The plans you establish through the optimization process give you a real-world outline of what’s needed in your maintenance department, why it’s needed, and how it will impact your organization. You can use this framework to establish a realistic budget with strong supporting rationales to help you get it approved. Any challenges to the budget can be assessed and a response prepared to indicate the impact on performance that any changes might make.

4. You’ll just keep improving

Optimization is a project that turns into an ongoing cycle of performing tasks, collecting feedback and data, reviewing performance, and tweaking maintenance strategies based on current performance and business drivers.

5. You’ll help the whole business be more productive and profitable

Better maintenance strategies keep your production equipment aligned to performance requirements, with fewer interruptions. That means people can get more done, more of the time. That’s the whole point, isn’t it?

Hopefully, this article has convinced you of the benefits of optimizing your maintenance strategies. Ready to get started or re-energize your maintenance optimization project? Check out our next blog article, How To Optimize Your Maintenance Strategy: A 1,000-Foot View.


Author: Dan DeGrendel

Optimizing your maintenance strategy doesn’t have to be a huge undertaking. The key is to follow core steps and best practices using a structured approach. If you’re struggling to improve your maintenance strategy — or just want to make sure you’ve checked all the boxes — here’s a 1000-foot view of the process.

1. Sync up

  • Identify key stakeholders from maintenance, engineering, production, and operations — plus the actual hands-on members of your optimization team.
  • Get everybody on board with the process and trained in the steps you’re planning to take.  A mix of short awareness sessions and detailed educations sessions to the right people are vital for success.
  • Make sure you fully understand how your optimized maintenance strategies will be loaded and executed from your Computer Maintenance Management System (CMMS)

2. Organize

  • Review/revise the site’s asset hierarchy for accuracy and completeness. Standardize the structure if possible.
  • Gather all relevant information for each piece of equipment.
    • Empirical data sources: CMMS, FMEA (Failure Mode and Effects Analysis) studies, industry standards, OEM recommended maintenance
    • Qualitative data sources: Team knowledge and past records

3. Prioritize

  • Assign a criticality level for each piece of equipment; align this to any existing risk management framework
  • Consider performing a Pareto analysis to identify equipment causing the most production downtime, highest maintenance costs, etc.
  • Determine the level of analysis to perform on each resulting criticality level

4. Strategize

  • Using the information you’ve gathered, define the failure modes, or apply an existing library template. Determine existing and potential modes for each piece of equipment
  • Assign tasks to mitigate the failure modes.
  • Assign resources to each task (e.g, the time, number of mechanics, tools, spare parts needed, etc.)
  • Compare various options to determine the most cost-effective strategy
  • Bundle selected activities to develop an ideal maintenance task schedule (considering shutdown opportunities). Use standard grouping rules if available.

This is your proposed new maintenance strategy.

5. Re-sync

  • Review the proposed maintenance strategy with the stakeholders you identified above, then get their buy-in and/or feedback (and adjust as needed)

6. Go!

  • Implement the approved maintenance strategy by loading all of the associated tasks into your CMMS — ideally through direct integration with your RCM simulation software, manually, or via Excel sheet loader.

7. Keep getting better

  • Continue to collect information from work orders and other empirical and qualitative data sources.
  • Periodically review maintenance tasks so you can make continual improvements.
  • Monitor equipment maintenance activity for unanticipated defects, new equipment and changing plant conditions. Update your maintenance strategy accordingly.
  • Build a library of maintenance strategies for your equipment.
  • Take what you’ve learned and the strategies and best practices you’ve developed and share them across the entire organization, wherever they are relevant.

Of course, this list provides only a very high-level view of the optimization process.

If you’re looking for support in optimizing your maintenance strategies, or want to understand how to drive ongoing optimization, ARMS Reliability is here to help.


One of the four basic principles in the Apollo Root Cause Analysis methodology is that for each effect there are at least two causes and these causes are either actions or conditions. bigstock--133288028_BWCropped.jpg

This principle causes you to think more critically, challenge causal relationships more consistently, and to understand that things are rarely as simple as they may seem.

One implication of this principle is that there should never be a straight line, or even a partial straight line of causes within a cause and effect chart. A straight line tells us that there are other causes that still need to be found or identified, and more questions must be asked.

In each causal connection we should see at least one action cause and one condition cause.

So what are actions and conditions?

Conditions exist—they refer to the current state of things. Take gravity for instance—it is there all the time. Gravity exists. So this cause would be a conditional cause.

Conditions must exist. They always exist alongside of any action.

An action cause is a cause which makes use of the available conditions. If the conditions didn’t exist, then the action would have no effect at all. The action cause is that moment in time when something happens. It is the thing that is different—the instigator or the catalyst of the effect that occurs.

Typically, there is one action and several conditions. Many of the action causes are also related to the things that people do. Action causes are readily seen and tend to be easily identified. When people tell the story of what happened they often list a series of actions, and relatively few conditions.  When we create a timeline or sequence of events, the initial straight line will be constructed mostly of actions. image_1.png

The Apollo Root Cause Analysis methodology demands an exhaustive search for both condition causes and action causes. If you only see half of the problem, will you really understand it? If you only find half of the causes, you will also only have half of the opportunities for controlling or mitigating the problem to an acceptable level.

Let’s take a look at an example – “An Object Fell Off a Platform”

“What happened to make the object fall?” would be a good question to ask. Let’s say someone kicked it off the platform. This is the direct cause of why the object fell, so this is considered an action cause. It is the ‘something’ that happened.  An action cause will typically be described using a noun/verb connection as in ‘object /kicked.’

But it’s not always that simple. There are other causes that have played a role in this scenario.  At this point in time it is important to challenge the concept of the linear connection of causes and keep searching for more.

The “Every Time Statement”

A useful tool to apply in this scenario is an “Every Time Statement.” The statement itself should be absolute in the sense that all causes in the connection need to be present. The same effect should happen each time the action occurs.

So, every time you kick the object off the platform it will fall? No, not every time.

Why not?  Because, the object in question must be elevated. If you kick it while it is on the ground it will not fall.

So is this an action cause or a condition cause?

It is a state of where the object was at the time it was kicked. So in this instance this cause would be labelled as a condition.

Now that another cause has been identified, you can repeat the “Every Time Statement.”

Every time you kick the object off the platform and the platform is elevated, the object will fall. Every time?  Well, it will only be true if there is gravity in play. If there is no gravity present, then this statement will not be true.

Is gravity an action or a condition? It’s not an event, it just exists. It was there when the problem occurred. This means that we would label this cause as a condition.

There are now three causes in this causal relationship, but have we identified every cause in that causal connection? At this point we have:

  1. Kicked object off platform
  2. The platform was elevated
  3. Gravity was present

Will the object fall every time?  Only if the object has a mass which is greater than that of air. If it were lighter than air, then it would not fall.

Is this cause an action or a condition? Again we observe that the object’s mass didn’t change. Its mass was what it was before the incident and had been so for some time. This makes this cause a condition.

Encourage people in the RCA group to actively look for the exception that makes a lie out of the “Every Time Statement.”  Every time you find an exception to this statement you have effectively identified another cause. Add it to your list of causes and repeat the “Every Time Statement.” When you can’t identify any other exceptions then you should have effectively identified every cause in that causal connection. The statement should now be absolute.

So what we have identified here is that there are at least four causes in this causal relationship that will influence whether an object will fall or not. In fact, every time something falls the same types of causes will be in play. The action cause will still need to occur, but this may come in different forms. The action can be different but it will still make use of the available conditions.


To Sum Things Up

It is valuable to be able to label causes as either actions or conditions. The process of labelling causes demands that you find multiple causes for each connection. This in itself will challenge your understanding of the problem.

Understanding what the conditional causes are will also lead you to finding the most effective solutions for your problem – the hard controls. By actively engaging in challenging the logic of each and every connection within the cause and effect chart consistently, many more conditional causes will be found and more options of control will present themselves. When you have the ability to eliminate a conditional cause, substitute it, or engineer it out, then your solutions and their outcomes will be more consistent, reliable, and predictable. You can therefore, with a fair degree of certainty, declare that the problem will not recur.


Puede usted cuantificar el impacto financiero de su programa de mantenimiento en su negocio? Incluye en sus cálculos no solamente los costos directos de mantenimiento, como mano de obra y repuestos, sino que también los costos de no hacer mantenimiento efectivo en sus equipos, como tiempo de paradas no planeadas, fallas de equipos y pérdidas de producción? calculate profit

La tarea de medir el impacto financiero de mantenimiento puede ser difícil pero sin embargo es una tarea de gran valor. Es el primer paso para encontrar maneras de mejorar su ganancias, en otras palabras el primer paso hacia una estrategia de mantenimiento optimizada.

En un estudio de mantenimiento realizado en 6 minas abiertas en Chile [1], se encontró que los costos de mantenimiento se aproximan a 44% de los costos de operar la mina. Esta es una cifra significativa, y resalta la relación entre mantenimiento y el desempeño financiero de una mina. Más recientemente en 2013, un estudio comparativo de minería [2] reportó que la productividad de los equipos mineros ha descendido 18% desde 2007, perdiendo 5% tan solo en el 2013. Además de la carga el tiempo de operación es un factor clave.

Pero entonces como saber si se están gastando muchos o muy pocos recursos en mantenimiento? Ciertamente, comparaciones con la industria proveen una guía. Las mejores prácticas de manufactura indican que el costo de mantenimiento debe ser menor al 10% de los costos totales de manufactura o menos de 3% los costos de reemplazo del equipo.

Mientras estas comparaciones pueden ser útiles, una manera más efectiva de responder la pregunta es mirar los síntomas de gastar muy poco o demasiado en mantenimiento. Al cabo que, las comparaciones no tienen en cuenta su historia partículas, ni las circunstancias operativas.

Los síntomas de gastar muy poco en mantenimiento incluyen:

  • Incremento en ‘costos de falla ocultos’ debido a pérdidas de producción
  • Riesgos y eventos de seguridad y medio ambiente
  • Daño a equipos
  • Daño a la reputación
  • Tiempos de espera de repuestos
  • Costos alto de logística de repuestos
  • Menor utilización de mano de obra
  • Demoras en envío de productos
  • Agotamiento de stock

Otros síntomas son explorados con mayor detalle en nuestra guía; 5 síntomas que indican que su estrategia de mantenimiento requiere una optimización.

Man in front of computer screen

Figura 1

En la mayoría de los casos, son estos ‘costos de falla ocultos’ los que tienen el mayor impacto en el resultado final. Estos costos pueden ser varias veces más altos que el costo directo de mantenimiento causando paradas no anticipadas y significativas al negocio. Es por esto que es importante encontrar maneras para medir los efectos de no gastar lo suficiente en mantener los equipos.

Varias herramientas y software existen para ayudar a simular los escenarios que pueden ocurrir cuando un equipo se avería, falla o al contrario es mantenido de manera proactiva. Un análisis de modos de falla, efectos y criticidad (FMECA por sus siglas en inglés) es una metodología comprobada para evaluar todos los modos de falla probables para una pieza de equipo y sus consecuencias.

Extender un FMECA a Mantenimiento Centrado en Confiabilidad (RCM por sus siglas en inglés) provee una guía para escoger la tarea óptima de mantenimiento. Combinar RCM con un motor de simulación genera una respuesta veloz del valor de mantenimiento y el impacto financiero de no realizarlo.

Armado con información obtenida de estos análisis, usted obtendrá un dibujo claro de los costos óptimos de mantenimiento de un equipo en particular y puede usar esta data de diferentes maneras para reducir los costos de operación. Puede ser que existan planes de mantenimiento redundantes que pueden ser removidos, o un programa de mantenimiento que sea más eficiente y efectivo, o costos de oportunidad asociados a una frecuencia y duración de parada especifica. Quizás sea más beneficioso  reemplazar el equipo que continuar manteniéndolo.

La idea es optimizar el desempeño de la planta para obtener el máximo de producción, mientras que se minimiza los riesgos de falla de partes claves del equipo. Haga esto de manera correcta y los costos del negocio empezaran a descender.

Quiere seguir leyendo? Descargue nuestra guía: 5 síntomas que indican que su estrategia de mantenimiento requiere una optimización.

[1] Knights, P.F. and Oyanander, P (2005, Jun) “Best-in-class maintenance benchmarks in Chilean open pit mines”, The CIM Bulletin, p 93

[2] PwC (2013, Dec) “PwC’s Mining Intelligence and Benchmarking, Service Overview”,


Figura 1. En esta imagen se observa el módulo RCMCost™ de Isograph que es parte de su software Availability Workbench ™. Availability Workbench, Reliability Workbench, FaultTree+, Hazop+ y NAP son marcas registradas del software de Isograph. ARMS Reliability es ditribuidor autorizado, entrenador e implementador.


“Quanto tempo deve tomar um ACR?”

Esta questão é semelhante a quanto comprimento tem um pedaço de corda?

Ouvi um gerente de uma planta que tem estipulado um período máximo de duas horas para um ACR a ser realizado em sua organização. Outro espera, pelo menos, uma “tormenta de idéias” de soluções antes da conclusão do primeiro dia – dentro das 6 ou 7 horas. Não é incomum para um projecto de relatório a ser exigido dentro das 48 horas do iniciado o ACR.

As três dicas a seguir irão ajudá-lo a cumprir os prazos e expectativas definidos quando se tem tempo curto. Uma das vantagens do método Apollo Análise Causa Raiz é que é um processo rápido, mas requer um controlador eficaz para obter os resultados desejados, ou seja, soluções eficazes.

Dica # 1 Você Defina o Problema

Imagine a RCA foi desencadeado por um incidente não planejado ou evento que cai em qualquer um dos segurança, meio ambiente, produção, qualidade, falha de equipamento ou categorias semelhantes. Você tem sido apontado como o facilitador por um superior / gestor que está respondendo ao evento particular. Seu superior / gestor pode compreender o mecanismo de disparo e pode muito bem indicar o título do problema.

Por exemplo, “laceração do braço superior”, “derramamento de amônia”, “atraso de produção” e assim por diante poderia ser a oferta que você faz para a equipe como o ponto de partida para a análise. Normalmente, como facilitador você vai ter reunido alguns dos “fatos” de relatórios dos primeiros que respondem, entrevistas, folhas de dados, fotos e assim por diante. Assim, um bom primeiro passo é elaborar uma declaração definição do problema, incluindo a relevância refletida pelas conseqüências ou impactos. A equipe, então, tem um ponto de partida para começar a análise, ainda que a declaração do problema pode mudar à medida que mais detalhes sejam fornecidos.

Idealmente, você já terá criado um arquivo no RealityCharting™ ea tabela de Definição do Problema pode ser projetada em uma tela ou até mesmo na parede clara onde seu mapeamento será feito com as notas Post-It™. Informações dos membros da equipe deveriam ter sido introduzidas e podem ser confirmadas rapidamente neste display. Você pode até mostrar o formato do Relatório de Incidente e focar na opção Aviso de Isenção que você selecionou deliberadamente: Finalidade: Para evitar a recorrência, não colocar a culpa.

Este trabalho preparatório poderia salvar pelo menos 20 minutos de tempo dos membros da equipe e permitir um lançamento imediato para a fase de análise.

Importante: Salve-se horas de re-trabalho e embaraço potencial salvando o arquivo, assim que este primeiro processo esteja concluído, se você não tiver feito isso, e, posteriormente, em uma base regular. Manter alguma forma de controle de versão para que a evolução do quadro nos dias seguintes podem ser rastreados, se necessário.

Se você está particularmente, com bons recursos, o desenvolvimento gráfico pode ser gravado no software simultaneamente, como cópia dura é criada no espaço da parede. Um pequeno grupo pode optar por criar o gráfico diretamente através do software e um meio de projeção decente.

Dica # 2 Direcione o Análise

É fundamental que a sua iniciativa na elaboração da definição do problema não seja considerado pelos membros da equipe como desautorizando eles. A etapa de análise em que todos têm a oportunidade de contribuir deve garantir que eles sentem que têm a “propriedade” do problema.

Para reforçar isso, é aconselhável escolher uma sequência de abordar cada membro, normalmente da esquerda para a direita ou vice-versa, dependendo dos assentos. Isto estabelece a exigência de que uma pessoa esteja falando cada turno, por outro lado, que toda e qualquer declaração serão documentadas e em terceiro lugar, que cada pessoa tem a igualdade de oportunidades. A sua gravação rápida e exata de cada pedaço de informação irá fornecer a disciplina necessária para minimizar a conversa fiada que pode perder tempo porque distrai foco. Quando você tem uma série de “sem comentários” dos membros da equipe, porque o processo tenha esgotado o seu conhecimento imediato dos acontecimentos, inicie a criação do gráfico.

Vale a pena lembrar a equipe que cada item de informação que foi gravado e postado na área de estacionamento, pode não aparecer em sua forma original no gráfico ou não aparecer jamais, em alguns casos. Porque a recolha de informação é uma rede ampla para capturar o máximo de conhecimento sobre o que aconteceu, quando e porquê, não haverá foco particular. Mas porque eles são provenientes de pessoas com experiência e perícia ou conhecimento íntimo de eventos e circunstâncias, eles têm algum valor. O valor exacto irá ser determinada pelo ponto onde a informação senta-se na lógica causa e efeito que começa no problema e está ligada pelas relações “causado por”.

Importante: O texto da Causa deve ser escrito em LETRAS MAIÚSCULAS. Vai ser mais fácil de ler/decifrar para a equipe no momento e talvez a partir de fotografias do gráfico mais tarde. Da mesma forma usando maiúsculas no próprio software significa que a projeção do gráfico é mais eficaz e a impressão de várias vistas é reforçada.

Dica # 3 O “Como e Se” de Criar um Gráfico da Realidade

Muitos proponentes exploram o entendimento existente do evento, capturando tantas causas ação como seja possível. Estas podem chegar através de um processo de 5 PORQUÊS, por exemplo, que se inicia no Efeito Primário.

Planta Parou (Problema ou Efeito Primário)

Por quê? Bomba de Alimentação Não Bombeia

Por quê? Acoplamento Quebrado

Por quê? Rolamento do Motor Danificados

Por quê? Pista de Rolamento Colapsada

Por quê? Fadiga

O método Apollo RCA requer o uso da expressão “causado por?” Para conectar as relações causa e efeito. Compreender que deve haver pelo menos uma ação e uma condição ajuda a revelar as causas “escondidas” e, especialmente, as causas de condição que não vêm à mente inicialmente.

Para apoiar esta expressão e o essencial “porquê”, é aconselhável perguntar “como”. Isto pode ser utilizado inicialmente pelo membro mais imparcial de sua equipe que tem sido comprometido especificamente por causa de seu/sua falta de associação com o problema e pode sinceramente fazer as perguntas supostamente “tolas”. Invariavelmente estas perguntas geram mais causas ou um arranjo mais preciso das causas existentes. A pergunta “Como é que isso acontece exatamente?” Pode conduzir a equipe para tomar os “passos de bebê” necessários. Isso também muitas vezes expõe diferenças entre “especialistas” e a resolução destas diferenças é sempre esclarecedor.

O facilitador precisa estar ciente da necessidade de suavemente “desafiar” a compreensão da equipe assegurando ao mesmo tempo a aplicação de rigor suficiente para gerar a melhor representação de relações causais. Isso pode ser feito de uma maneira neutra, utilizando a proposição “SE”.

Dado que todo efeito requer pelo menos duas causas, então você pode lidar com a equipe com a proposição: “Se ‘umo existe’ e ‘três existe’ (duas condições), em seguida com ‘quatro acrescentado’ (a ação) será que o efeito é “oito” todas as vezes?”. Usando esta técnica em cada elemento causal irá gerar a clareza e segurança sendo procurada para compreender as causas do problema. Se cada “equação” (elemento causal) no gráfico é “real” e as próprias causas são “reais” (suportadas por provas), então a equipe está bem colocada para considerar os tipos de controles que ele poderia implementar para prevenir a recorrência da problema.

As mais causas que são reveladas mais oportunidades a equipe tem que identificar possíveis soluções.


Para acelerar o processo de ACR:

Passo 1 – Facilitador reúne informações sobre o evento e preenche a Declaração da Definição do Problema.

Passo 2 – Facilitador dirige a coleta de informações lançando uma ampla rede e solicita sistematicamente informações dos participantes.

Passo 3 – Use a informação recolhida para construir um RealityChart™ com ações com base no que aconteceu, então procure outras causas, como condições que podem ser inicialmente ocultas. Use Como e Se para ajudar a validar que as relações causais sejam lógicas.

Com um gráfico completado a etapa de achar soluções pode começar.

Nossa Curso Facilitadores Análise Causa Raiz (ACR) ensina os alunos a conduzir uma investigação com confiança e para encontrar soluções práticas para os seus problemas. Cursos de formação públicas oferecem nas principais cidades ao redor do mundo durante todo o ano. Saiba mais sobre as vantagens de participar de um curso de formação de público, ou consulte o nosso calendário de treinamento em todo o mundo para os próximos cursos e reserve online.

Many of us have them. The invisible “graveyard” where good intentions (AKA – corrective actions from your root cause analysis investigation) went to die.

How do they end up there? bigstock-Spooky-old-graveyard-at-night-71555167.jpg

We all know that all the time and money spent on a root cause analysis investigation and identifying solutions are worthless if the solutions are not implemented. An investigation can usually be done within a week but solutions can take much longer to implement. They sometimes require the involvement of multiple teams or departments, regulatory agencies, engineering, planning, budgeting, and the list goes on and on. For these reasons, it can be challenging to stay on top of all the corrective actions you identified in your investigation, who’s responsible, and the status of an action item at any given time.

We can offer a few basic tips that will give you a head start in tracking action items effectively:

  • Be clear about who is responsible for each corrective action. You don’t want to create the opportunity for people to be able to pass the buck with “I thought Bob was going to do it”.
  • Have a mechanism in place by which the implementation of corrective actions can be tracked.
  • Give ownership of a solution to an individual, not a group or department.
  • Assign a due-date for each corrective action.
  • Support people in their efforts to implement corrective actions.
  • Make sure you follow up on each corrective action – check back with the individual responsible to make sure that progress is being made.

But even these “basics” are easier said than done.

In reality, most likely you come out of your root cause analysis investigation with a list of action items for which various people are responsible. Then everyone goes about their regular workdays and may or may not remember to follow through on any additional tasks they were assigned. Even if you have an appointed person to follow up with the action items and make sure they’re on track, it can be difficult to keep up with who has done what. Many managers rely on an Excel spreadsheet to manually track what has and hasn’t been done, due dates, and so forth. But this puts a lot of pressure on one person to keep up with everything – to manually send reminders to folks who haven’t completed their tasks and to enter the information properly when it has been done.

Even when the Excel file has been carefully kept up-to-date, it often lives locally on the manager’s hard drive, and other members of the team don’t have any visibility as to what has and hasn’t been done.

Sound familiar?

If your RCA program is starting to mature it may be time to consider an enterprise solution to help you better manage all your investigations.

Corrective action tracking inside of an enterprise RCA tool can help you maintain visibility and accountability by tracking the status of action items and assigned solutions. Team members get sent automatic reminders of incomplete or overdue action items and they can easily update the status of their assigned tasks, instantly informing everyone when a task has been completed. You can also create personalized dashboards with reports showing open, completed, or overdue corrective actions.


In addition to effective action tracking, an enterprise RCA solution can more broadly help your company implement and manage an effective overall root cause analysis program.

Here are some of the main features to look for:

  • Enterprise-wide visibility of your RCA program
    • Expand the RCA knowledge base and accessibility across an organization.


  • Search across the database for past RCAs, solutions, causes, equipment items, etc
    • Leverage information from previous investigations in your current investigation.
  • Classify problem-types by company or industry standards or by a pre-set list
    • Classify and tag files for easy search-ability. Create custom tags incorporating company or industry standards.
  • Create and share interactive KPI reports
    • Build reports on your chosen metrics and visually display key performance indicators in tables, charts and graphics.
  • Create personalized dashboards
    • Specify which reports are most important to you for immediate dashboard display on your homepage.
  • Save and embed reference files such as photos equipment failure data, interviews, etc
    • Preserve integrity by securely collecting and storing evidence and important reference files.
  • House internal company resource documents and tools
    • Store company corporate standards or reference files such as frequently referenced industry documents in a central location for immediate access when facilitating an RCA.
  • Progress updates
    •  Communicate with all users through on-page messaging that lets you quickly share information, receive feedback and record comments
Keeping your RCA investigation corrective actions out of the graveyard is a very common challenge in maturing RCA programs, but it’s just one of many. To see what you may be up against in the future, check out our free eBook, 7 Challenges to Implementing Root Cause Analysis Enterprise-Wide and How to Overcome ThemRemember, in order to resurrect your RCA investigation corrective actions, start with the basics that we listed at the beginning of this article. But also keep in mind – the more mature your RCA program becomes, or the larger and more complex your organization, the larger and more complex your problems become. So when you’re ready to alleviate this pain point altogether, consider whether an enterprise RCA solution might be the next step in your program’s development.

By: Gary Tyne CMRP, CRL

Engineering Manager – ARMS Reliability Europe

Working for a global organization has taken me to some weird and wonderful places around the world. Different cultures, traditions, religions and people certainly enlightens you to the wonderful and colorful place we all call home.

I would say in most of these countries I have at some stage taken a taxi or at least been chauffeured by a driver in a customer’s company vehicle. These experiences have led to some interesting conversations on life, travel, politics, and football with some very knowledgeable and diverse taxi drivers. On the other hand, I have had drivers that have not spoken a word and have just delivered me to my destination in silence, even after trying to engage in conversation, their chosen dialogue is nil speak. bigstock--131191391

A recent taxi encounter occurred when I had just left my customer and was going to call for a taxi, when I spotted someone being dropped off at my current location. I asked the driver if he could take me to Dublin airport and he obliged.

This is when I met Mohammed, an immigrant from Kenya who had moved to Ireland 17 years ago. He was smiling and cheerful and had a generally happy persona about him. We discussed weather in Ireland versus Mombasa, we mentioned football briefly, and then we started to discuss cars. This occurred when a brand new Mercedes went past us in the fast lane and I passed comment on what a beautiful car that was.

Mohammed started to discuss the Toyota Corolla in which we were driving and how he loved his car for its level of reliability. I asked how many miles his vehicle had driven and he pointed out that he had covered over 300,000 miles since he purchased the car brand new in Northern Ireland. He went onto explain how he ensured that it was regularly maintained to a high standard with the best quality oil and original OEM parts being used when any replacements were required. The engine and gearbox were original and providing ‘you look after your car, it will look after you.’ Mohammed was proud of the length of service he had achieved from his vehicle and that the car had never let him down. However, as the vehicle operator he recognized the importance of regular maintenance and the use of the right quality parts. He also said that he only allowed one mechanic to work on his vehicle because he was very skilled and competent at his job and could not trust others to do work on his taxi.

Mohammed was also proud to be a taxi driver in Ireland and combined with his ‘Reliability’ story certainly made the trip to Dublin airport a memorable one. Mohammed did not know my job role and that I had spent over 30 years in Maintenance and Reliability, but he gave me a text book account of what is ‘Reliability’! I said goodbye to Mohammed after he let me take a picture of his mileage and car. I wished him luck and many more years of happy motoring in his reliable Toyota motor vehicle.

Sitting in the departure lounge my trip to the airport and conversation with Mohammed certainly made me think: mileage

  • Do we see this level of passion and ownership amongst today’s industrial operators?
  • Should Operators take more care for their assets, ensuring high reliability through a program of basic care?
  • How do we ensure the right levels of competence in our technicians?
  • How do we ensure that the correct specification and quality of parts are being purchased?
  • How do we ensure that maintenance is being performed at the right frequency on the right asset?

This ‘Reliability Tale from the Taxi’ may have also generated further questions in your own mind, for me, it provided me with  another great ‘Reliability’ story that I can share during one of our global reliability training courses.