Yearly Archives: 2017

You are browsing the site archives by year.
Author: Dane Boers, Senior Reliability Engineer

The risk matrix has served its purpose but falls well short of the data-driven business requirements of today. Enter the Value Framework.


For more than a decade, the risk matrix has been the go-to decision-making tool for assessing risk, and for good reason. The risk matrix is practical, easy to use and flexible enough to apply to various risk types and situations, including:

  • Assessing risks of a particular asset
  • Deciding which investments or projects have the highest importance
  • Choosing which risk controls to implement

Figure 1 Example of a Risk Matrix

The purpose of the risk matrix is to simplify the assessment process while still providing meaningful results. Technology and data processing tools now allow for complex assessments using simple interfaces – this plays a major role in supporting the increasing need for improved risk based decision making.

Shortfalls of the risk matrix

Granularity and/or resolution

Risk is not discrete but continuous, and many risks can be similar. Thus, the first shortfall of the risk matrix is that similar risks cannot be separated even though there are known differences. This reduced granularity can result in sub-optimal decisions and missed opportunities for improvement, because subtle differences in likelihood or consequence will likely result in the same ‘risk box’ selection.

If you were to prioritize two similar risks with all else being equal, it would make sense to address the slightly higher risk before the lower risk ­– even though the assigned risk level is the same according to the matrix.

Consider a reduction in the likelihood of an event by 50%, from once in 50 years to once in 100 years i.e. doubling the life of an asset. This is a huge improvement and may mitigate a significant amount of risk, especially if this is then applied at scale. According to some risk matrixes, the likelihood before and after would be ‘rare’, showing no improvement in risk exposure.

Businesses with large volumes of risk data need to be able to resolve very similar risks to make the best decisions possible, especially when constrained on expenditure or resources. This is even more evident when dealing with large fleets of similar assets and risks.

Multiple risks transparency

The next shortfall of the typical risk matrix is the ability to handle and interpret events that cause multiple similar consequences.

Consider an equipment failure that causes a large amount of smoke in a building. The result may be that 50 people require medical treatment for smoke inhalation. If a single medical treatment injury is assessed as a ‘moderate’ safety consequence, at what point does the sum of these injuries constitute the equivalent of a ‘critical’ or ‘catastrophic’ consequence? E.g. 10, 50, 100 persons?

Without the ability to summate or determine a total risk for an event, low impact but high-volume consequences could leave your organization exposed.

Cost benefit

Assessing risk based purely on outcome risk levels is only one-half of the equation for making effective decisions. The usual risk matrix methodology prioritizes the highest risk levels first, with little regard for the cost to achieve the mitigation. Because organizations have limited resources, determining the best way to utilize these resources is key to remaining competitive in the marketplace. The missing component is cost (monetary or otherwise), and without a cost component, we are unable to answer the following question.

If I can mitigate one of two ‘moderate’ safety risks with the same likelihood, which one should I mitigate?

Once you identify that mitigation of the first risk costs 50% less in dollars, time and resources, the decision becomes clearer. The answer to this critical question is missing from most risk matrixes and risk frameworks.

Cost Benefit


To make effective investment decisions around risk mitigation and exposure, an organization must be able to compare and trade-off the value from different risk types (e.g. stakeholder risk vs. environmental risk). In a budgetary or resource constrained environment, this is especially important. An organization must understand which consequences are more important relative to others. A risk matrix partially does this by grouping the consequences into ‘negligible’ or ‘moderate’ groupings, however, this does not answer the question of:

If I can spend $1000 and mitigate either a ‘moderate’ stakeholder risk or a ‘moderate’ environmental risk with the same likelihood, which one should I do?

The matrix type framework is not flexible enough for most organizations to achieve exact alignment of risk types.

X by Y grid and descriptions

When thinking about consequences, the risk ‘levels’ must be meaningful to be constantly applied. This is why safety risks are often thought about in terms of ‘first aid’, ‘medical treatment’, ‘disabling’ or ‘fatal’ injuries. These can be measured and conceptually linked to an event as the most likely outcome. The ‘negligible’ and ‘moderate’ descriptions aren’t meaningful enough.

In the safety risk example above, there are four consequence levels. What if an environmental risk type is introduced into the matrix, and it only has three consequence levels (e.g. ‘<100L spill’, ‘100L-500L spill’ and ‘>500L spill’)? Because the number of meaningful levels can be different between risk types, they cannot fit into an X by Y matrix without distortion.

The solution: ‘The Value Framework’

Identify what is important to your organization (value measures)

The first step in creating a value framework is to identify the things that your organization values or considers to be important. An existing risk management framework or risk matrix is a good place to start. Risk types (e.g. safety, environment, stakeholder, legal and compliance etc.) are common values that can be measured and are found in most value frameworks.

Benefits such as financial returns, increases in employee efficiency and so on are also important and should be included. Another common inclusion in a value framework is strategic targets, KPIs or other measures. Everything identified in this step is known as a ‘value measure’.

Identify the common levels and calculations

Each ‘value measure’ obviously needs to be measured! The next step is to determine the discrete levels for each measure (e.g. for safety, they could be ‘first aid’, ‘medical treatment’, ‘disabling’ or ‘fatal’ injuries). Then add calculations for KPIs or values like ‘employee efficiency’ where an exact value can be obtained. For example:

Employee efficiency = Number of employees affected x hourly rate x hours saved per employee


Once the value measures and their calculations have been identified, they need to be aligned to a common scale. This is to allow a non-biased tradeoff between any of the measures in the framework. Typically, this common scale is dollars or a dollar equivalent unit. Every level and calculation of every value measure needs to be quantified. For most risk types, this is calculated as the direct cost or benefit to the organisation.

For example, the cost to the organisation for a safety medical treatment injury (MTI) would be:

$10,000 penalty cost + $1,000 legal cost + $1,500 compensation cost = TOTAL $12,500


Now that we have a rational and consistent way to assign a value to every risk, benefit, cost and other measure that an organisation values, the value framework can be used to assess every investment the same way.

Figure 2 What a Value Framework could look like

Figure 2 What a Value Framework could look like


The risk matrix is a great tool for rapid risk qualification, but it cannot be used effectively to make risk and value based decisions. More information is required.

Organisations today need to:

  • Differentiate large volumes of risk, and risks with extremely small likelihoods
  • Evaluate and totalize multiple risks
  • Incorporate costs into risk-based decision-making processes
  • Trade off one risk type for another achieving a better overall economic outcome
  • Have a framework that accomplishes all the above with consistent application and transparency

Creating a value framework meets these requirements and allows organizations to make effective value based and risk informed decisions.

To learn more about creating a value framework download the executive whitepaper ‘Value Based Decision Making’


bigstock-Leadership-74184760The previous article in our blog series described the recommended training strategy for your RCA program development. The next step in achieving a successful RCA program is to ensure leadership understands their role and has the tools in place to ensure the longevity of the program and its effectiveness.

To ensure the success of your root cause analysis program leadership must have a vested interest and take responsibility not only for developing and overseeing the functions of the RCA effort, but also monitoring the status of the individual analyses and associated solutions. This monitoring is typically done by the Steering Committee in conjunction with its other strategic responsibilities.

The critical elements to track in relation to conducting the root cause analysis include

  • Incident date
  • RCA assignment date and lead
  • Estimated RCA completion date
  • Days past due
  • Escalation activity
  • Actual completion date

The critical elements to track in relation to the solutions that are to be implemented include

  • RCA completion date
  • Solution assignment date and lead
  • Estimated solution completion date
  • Days past due
  • Escalation activity
  • Actual completion date
  • Frequency of incident recurrence
  • Annual savings/HSE incident reduction

steering committeeOnce a root cause analysis has been completed, a list of potential solutions will be developed by the RCA team and submitted to the Steering Committee via the Program Champion or his/her designee for approval. The Steering Committee then assigns these solutions to individuals for completion and puts them into an action plan format with assigned due dates. These actions should be completed in the shortest time possible, otherwise the process will quickly fade away. The Steering Committee must track the status of open RCAs, the progress of implementing the solutions to ensure timely completion, and the effectiveness of previously implemented solutions (as measured by recurrences of the original incidents). New analyses should not be started if a large number of solutions remain to be implemented.

An appropriate person needs to be assigned the responsibility of tracking progress and recurrence.  The right person for this responsibility may be different for different organizations. Progress is tracked by showing the number or percentage of completed solutions. Recurrence will be tracked by measuring repetition of the incident.

Some organization will already have software and methods for tracking tasks, such as a CMMS. If this is the case, it can be considered for RCA and solution tracking as well. However, if a system does not currently exist or does not fulfill all the organization’s RCA tracking needs, then we would recommend considering RCProTM enterprise RCA software. It allows for the generation of an action list, due dates, and comments of each analysis to be shared with team members. It will also provide detailed reports on current investigation status, action tracking, outstanding Items, and view systemic issues across the organization.

This is where the Steering Committee review and support really comes in to play. The leadership team should review RCA status and solution implementation and final results as a regular part of Steering Committee business. The Steering Committee’s main role is to ensure that RCAs are completed in a timely fashion and that resulting solutions are implemented and tracked for effectiveness.

So far, this blog series has covered:

The Key Steps of Designing Your Program

Defining Goals and Current Status

Setting KPIs and Establishing Trigger Thresholds

RCA and Solution Tracking and Roles and Responsibilities

Recommended RCA Team Structure

Responsibilities of the Six Roles

RCA Program Development Training Strategy

And, Oversight and Management.

Stay tuned for our next installment on RCA Process Mapping.

Author: Jason Ballentine, VP of Engineering for ARMS Reliability

High Season Means Higher Stakes

bigstock--134880761This summer, the heat is shattering records around the United States—in Arizona, 119°F (48°C) days mean dozens of plane flights have been grounded and air conditioners are demanding an unprecedented number of megawatts from utilities. With average temperatures rising every summer and energy demand following suit, utilities have recognized the need to be more proactive about reducing their risk of outages.

Recently, the Chief Operating Officer of one energy generation company sought our help to fend off any issues that could result in a summer outage. Not only would an outage mean unhappy customers, but it would also mean financial losses if the utility couldn’t run at maximum capacity during its most lucrative season.

Throughout the winter, this utility saw a few small issues here and there. While nothing too dramatic happened, the COO recognized that he wouldn’t be able to afford something bigger going wrong during the busy season. He approached us to conduct a Vulnerability Assessment and Analysis (VAA) that would help identify his company’s most critical issues and reduce the likelihood of a service interruption.

*A VAA can be conducted on any type of operation in any industry. Learn more

Shedding Light on Potential Vulnerabilities

The analysis began with one power plant. This utility was like many other operations—they had several vulnerabilities on their radar in some form but no central repository for tracking them all. There might be a machine operator who knew about one issue, there might be an email chain about another issue, a few deferred work orders hanging around, but no way of making all issues known to all parties.

We began collecting information about the plant’s vulnerabilities—conducting individual interviews and brainstorming sessions with small groups of engineering and operating staff. We also reviewed event logs and work order histories to determine whether past events were likely to reoccur. We wanted to know: what issues had they been living with for a while? Where were they deferring maintenance? What spare parts were they missing? What workarounds were in place? Over the course of about a week, we reviewed all the vulnerabilities that could slow down or stop production on 40,000 pieces of equipment.

Concentrating on the Critical

Blank checklist on whiteboard with businessman hand drawing a reOut of this process, about 200 vulnerabilities were identified. Next, we scored each vulnerability in terms of likelihood and consequence and then ranked them “low,” “medium,” or “high” according to the corporate risk matrix. While there were about 25 vulnerabilities that we identified as being in the “high” category, we determined that 16 of them comprised approximately 80 percent of the risk to production.

If the utility focused on resolving these 16 issues first, they would see the greatest results in the shortest amount of time. We were also able to show the utility which type of vulnerability was most prevalent (wear and tear) and which systems were most in need of attention.

The final step was to assign a high-level action to each of the most critical vulnerabilities (examples might be “order spare parts” or “seek approval for design change from fire marshal”). Now the utility had a clear plan for which vulnerabilities to address first, where to begin resolving each vulnerability, who was responsible for each action item, and a recommended time frame for taking action.


Like most organizations, this utility wasn’t surprised by the vulnerabilities we identified. Chances are, these issues had been looming in the background making everyone somewhat uneasy due to the lack of clear prioritization or path to resolution.

Over the course of just three weeks, our Vulnerability Assessment and Analysis captured all the potential vulnerabilities, prioritized them according to criticality, and provided a clear path of action. By following this plan, the utility could dramatically reduce the chances of a catastrophic slow down or stoppage, eliminating much of the stress that usually accompanies the high season.

The utility’s COO was so pleased with the results at the first plant that he immediately scheduled a Vulnerability Assessment and Analysis for the next power plant, with plans to eventually cover them all.

It’s important to conduct a Vulnerability Assessment and Analysis before a period of high production, but it’s also a useful process in advance of a scheduled work stoppage. This way any fixes that are identified can be completed without incurring additional downtime.

Find out more about our Vulnerability Assessment and Analysis process.

Author: Jason Ballentine, VP of Engineering for ARMS Reliability

Starting From Scratch With Spares

Portrait of warehouse worker talking to supervising manager whilAs anyone with a hand in running a household knows, it’s important to keep a stockpile of key items. You certainly don’t want to find out the hard way that you’re on your last square of toilet paper. But in the case of a facility like a power plant, a missing spare part could be more than just a nuisance—it could be downright expensive.

Determining the appropriate spare parts to have on hand in a large facility, however, can be tricky. This is especially true after building a facility from the ground up, when you don’t have a frame of reference for which spare parts you’re most likely to need first.

Most organizations deal with this in one of two ways: 1) they guess or 2) they purchase according to a spares list provided by an equipment vendor.

A Reliability-Focused Purchase List

There are obvious limitations when it comes to guesswork—making the wrong guess can result in huge expenses either in unnecessary spare parts or in costly downtime. A vendor-suggested list is probably somewhat more accurate, but such suggestions are unlikely to take into account the specific needs of your organization. We approach spare part holding recommendations through the lens of reliability as it applies to each specific operation. As factors change, it’s important to re-evaluate, making sure to take into account everything that could influence purchase priorities.

Recently, a utility company approached us to review the list of spare parts their equipment vendor had recommended. According to the vendor, this utility needed to purchase $4.9 million worth of spare parts up front. The utility wanted a second opinion before making such a sizable investment.

Our Approach

We started the spare parts analysis by looking at the list provided by the equipment vendor, but then we dug much deeper. We explored a series of questions, including: How often is this part likely to fail? What is the cost of the downtime if the part is attached to a critical piece of equipment? What is the unit cost of the spare part? What is the lead time to obtain a spare? Is this part likely to fail at any time throughout its lifecycle, or is it only likely to fail at the end of its life? There is no point in purchasing a spare today if you are unlikely to need it for another 20 years.

In all, about 1,500 pieces of equipment were reviewed over 40 days before providing a recommended list of spares. The final list included some of what the vendor had recommended, left off many of the vendor’s recommended parts, and suggested a few additional parts that weren’t in the original list.

The final critical spares list that was recommended included a total of $2.2 million in spare parts—a savings of $2.7 million over what the vendor had originally recommended.

Built to Adapt

Saving money with ARMSOur recommended spares list is intended to be responsive to changing needs and new information. When the utility took a second look at its downtime cost and calculated that it was actually $10/megawatt and not the $23/megawatt they had initially determined, we re-evaluated the spares list, reducing the utility’s recommended purchases by another $200,000.


If your organization is like most, you probably run into trouble when it comes to having the right spares on hand. Either you’re missing the right parts when something breaks down, or you have expensive spares gathering dust and potentially going bad in storage. ARMS Reliability takes the guesswork out of developing a critical spares list, taking into account item costs, the likelihood of failure, lead times, downtime costs, and all other relevant factors.

The investment this utility made to conduct the analysis with our help ultimately reduced their bottom line equipment costs by $2.7 million—which represented a savings of 50 to 1. Beyond the monetary benefit, the utility’s Reliability Engineer felt much more confident in the approach taken. He was also relieved to avoid grossly overspending on spares.

Find out more about ARMS Reliability’s Spare Part Holding Analysis


El artículo anterior en nuestra serie de blogs sobre Desarrollo de Programas RCA describió las responsabilidades de los seis roles requeridos para que su programa funcione bien. El siguiente paso para lograr un programa de RCA exitoso es desarrollar una estrategia integral de capacitación de RCA que asegure la sostenibilidad del esfuerzo para reducir la frecuencia y severidad de incidentes indeseables en el futuro.

Un error común es sub-entrenar, principalmente facilitadores de RCA, en relación con el diagrama de disparador que discutimos en un blog anterior. Puede recordar que al desarrollar el diagrama de disparador, se revisa una base de referencia de 3 años de eventos desencadenantes para asegurar un balance adecuado de eventos y facilitadores entrenados. Suponiendo que las responsabilidades de facilitación de RCA serán una adición a las responsabilidades existentes del puesto, uno esperaría que un facilitador condujera un RCA disparado en promedio una vez al mes con una serie de RCA ad hoc menos importantes entre medio, para mantenerse actualizado en las habilidades.

Debería haber al menos un Super Usuario en cada instalación y en el nivel corporativo. Del mismo modo, todo el personal que puede ser un primer respondedor de un evento desencadenado o que se espera que participe en el RCA debe ser entrenado a un nivel apropiado como se indica en la tabla siguiente (haga clic en la foto para ampliarla).


También se debe realizar un curso de “actualización” según sea necesario. Un elemento a menudo pasado por alto en un programa de entrenamiento es la necesidad de mantener niveles óptimos de habilidades con el tiempo, especialmente si hay períodos prolongados donde las habilidades no pueden ser practicadas. Esto a menudo puede ser el caso de eventos disparados debido a su naturaleza aleatoria. Esto crea el peligro de que las habilidades y el conocimiento no mejoren al mismo ritmo que las habilidades de otros trabajos. Esto es especialmente cierto para Facilitadores, Primeros Respondedores y Participantes. Deben mantenerse registros que rastreen el número de RCA facilitados, las primeras respuestas y las participaciones de RCA y se revisen al menos una vez al año. Si los Facilitadores han pasado seis meses, o los Participantes / Primeros Respondedores un año, sin usar estas habilidades, entonces se debe considerar la capacitación de actualización.

El entrenamiento es un componente esencial para todo el personal involucrado con el programa RCA de una organización. Tener los roles adecuados, con las habilidades adecuadas y asegurar que esas habilidades permanezcan frescas con el tiempo es importante para el éxito general de su esfuerzo de RCA.

ARMS Reliability lidera la formación de clase mundial para ayudarle en cada etapa de su viaje RCA. Obtenga más información sobre los cursos requeridos y recomendados, y póngase en contacto con nosotros para obtener más información.

Hasta el momento, esta serie de blogs ha cubierto:

Los Pasos Clave para Diseñar Su Programa

Definición de Metas y Estado Actual

RCA y Seguimiento de Soluciones y Roles y Responsabilidades

Estructura Recomendada del Equipo RCA

Y, Descripciones de Cada uno de los Seis Roles Dentro de un Programa de RCA

Y, Estrategia de Capacitación de un Programa de RCA



bigstock-Training-Courses-on-Ring-Binde-121847267_Resized.jpgThe previous article in our blog series on RCA Program Development described the responsibilities of the six roles required for your program to function well. The next step in achieving a successful RCA program is to develop a comprehensive RCA training strategy that will ensure sustainability of the effort to reduce the frequency and severity of undesirable incidents into the future.

A common mistake is to under-train, primarily RCA facilitators, in relation to the trigger diagram which we discussed in an earlier blog post.  You may recall that in developing the trigger diagram, a 3-year baseline of triggering events is reviewed to ensure an adequate balance of events and trained facilitators. Assuming the RCA facilitation responsibilities will be an addition to the existing position responsibilities, one would expect a facilitator would be leading a triggered RCA on average once per month with a series of less important ad hoc RCAs in between to stay current on skills.

There should be at least one Super User at each facility and at the corporate level. Likewise, all personnel that may be either a first responder to a triggered event or expected to participate in the RCA should be trained to an appropriate level as outlined in the table below (click photo to enlarge).


A “Refresher” course should also be undertaken as needed. An often overlooked element of a training program is the need to maintain optimum skill levels over time, especially if there are prolonged periods where the skill may not be practiced. This can often be the case with triggered events because of their random nature. This creates a danger that skills and knowledge may not improve at the same rate as other jobs skills. This is especially true for Facilitators, First Responders, and Participants. Records should be maintained that track the number of facilitated RCA, first responses, and RCA participations and reviewed at least annually. If Facilitators have gone six months, or Participants/First Responders one year, without using these skills then consideration should be given to the refresher training.

Training is an essential component for all personnel involved with an organizations’ RCA program. Having the right roles with the right skills and ensuring those skills stay fresh over time is important to the overall success of your RCA effort. ARMS Reliability leads world-class training to assist you in every stage of your RCA journey. Learn more about the required and recommended courses, and contact us for more information.

So far, this blog series has covered:

The Key Steps of Designing Your Program

Defining Goals and Current Status

Setting KPIs and Establishing Trigger Thresholds

RCA and Solution Tracking and Roles and Responsibilities

Recommended RCA Team Structure

Responsibilities of the Six Roles

And, RCA Program Development Training Strategy.

Stay tuned for our next installment on RCA Effort Oversight and Management. 

Esta es nuestra sexta entrega en nuestra serie de blogs sobre el Desarrollo del Programa de Análisis Causa Raíz. Anteriormente en la serie introdujimos las seis funciones necesarias dentro de un programa de RCA y luego esbozamos una estructura de equipo recomendada para cumplir con esas funciones. En este artículo, proporcionaremos una descripción detallada de la parte de cada uno de los seis roles del plan dentro de un programa eficaz de RCA.


El Comité Directivo de RCA

Es responsabilidad del Comité Directivo desarrollar y supervisar las funciones estratégicas del bigstock-Teamwork-Of-Businesspeople-48286511.jpgesfuerzo de RCA. Los miembros del Comité Directivo deben tener colectivamente la autoridad para asignar las funciones y responsabilidades de RCA, asignar recursos al esfuerzo y, lo que es más importante, tener un interés colectivo en el éxito del programa.

Los equipos de liderazgo de esta naturaleza por lo general ya existen en casi todas las organizaciones y, como tales, pueden servir como el Comité Directivo simplemente añadiendo la revisión del estado del programa de RCA a las reuniones de rutina programadas regularmente como un punto del orden del día. Esto evita la creación de más burocracia y reuniones duplicadas. Las responsabilidades y funciones del Comité Directivo son las siguientes.

Responsabilidades estratégicas:

  • Definir el alcance y la amplitud del esfuerzo de RCA
  • Asignar funciones y responsabilidades de RCA
  • Revisar y aprobar los requerimientos de recursos del RCA para fines de presupuesto anual
  • Establecer y ajustar los KPI del programa (al menos anualmente) para reflejar mejoras en el rendimiento de la organización
  • Establecer y revisar los criterios de umbral (al menos una vez al año) para asegurar un equilibrio adecuado entre los facilitadores de RCA capacitados y la demanda formal de RCA
  • Patrocinar un plan administrativo de cambio humano donde sea necesario para asegurar el apoyo de las posiciones y departamentos afectados.

Responsabilidades Tácticas:

  • Aprobar (o no) las soluciones de RCA recomendadas para la implementación, incluyendo priorización y asignación de recursos
  • Supervisar el estado de los RCA abiertos
  • Supervisar y asegurar la implementación oportuna de soluciones de RCA aprobadas
  • Supervisar la eficacia de las soluciones implementadas


El Campeón de RCA

El Campeón sirve como patrocinador principal del esfuerzo de RCA. Las funciones clave del Campeón son promover el esfuerzo de RCA a través de un apoyo estable y hacer recomendaciones al Comité Directivo para los recursos necesarios para el mantenimiento táctico del programa. Las principales responsabilidades son las siguientes.


  • Supervisar el impacto de la solución de RCA en los KPI del programa y comunicar regularmente estos resultados en toda la organización.
  • Promover un espíritu de éxito reconociendo y celebrando los logros y contribuciones específicas del equipo de RCA a la organización
  • Atraer el interés individual de ser parte del equipo de RCA comunicando el valor agregado a su carrera y los beneficios para la organización

Recomendaciones de Recursos Tácticos:

  • Hacer recomendaciones al Comité Directivo para los requerimientos de recursos del programa de RCA para el proceso presupuestario (horas-hombre, cargos, capacitación, etc.) necesarios para mantener un esfuerzo exitoso de RCA
  • Administrar el calendario general de entrenamiento de RCA de la organización
  • Rastrear frecuentemente el estado de las RCA abiertas y la implementación de soluciones
  • Asegúrese de que el Comité Directivo tenga datos actuales y precisos sobre el estado del RCA y la solución para tomar decisiones informadas sobre la asignación de recursos y la priorización de las acciones.


El Súper Usuario

Dependiendo del tamaño y la estructura de la organización, el papel de Súper Usuario puede ser beneficioso. El Súper Usuario es el nivel por encima de Facilitador en términos de capacitación formal en RCA, experiencia y habilidad con la metodología de RCA y cualquier software de apoyo. También tendrán habilidades avanzadas de facilitación de grupo y también deberán ser el facilitador más experimentado en el sitio. No es raro combinar las funciones de Súper Usuario y Campeón de RCA. Las responsabilidades de Súper Usuario incluyen lo siguiente:

  • Facilitar, o ayudar a facilitar, los eventos más significativos y análisis complicados para la organización
  • Realizar evaluaciones críticas de la garantía de calidad del análisis de otros facilitadores
  • Actuar como mentor de nuevos facilitadores
  • Asegúrese de que todos los facilitadores utilicen la metodología de RCA adecuada
  • Decidir, en conjunto con el Campeón, cuándo recomendar la asistencia de facilitación imparcial de expertos terceros.


El Facilitador de RCA

Los Facilitadores de RCA, o los profesionales de la metodología, son fundamentales para llevar al equipo de RCA a descubrir las causas del incidente en cuestión y las medidas o soluciones preventivas asociadas. El Facilitador debe ser neutral en el proceso. Muy a menudo las tareas de facilitación se asignan a las posiciones o los individuos debido a sus habilidades técnicas, lo cual puede ser un error. Las habilidades de facilitación y la experiencia técnica son dos talentos claramente diferentes. Los facilitadores de RCA de calidad tienen una personalidad naturalmente amigable, procesos de pensamiento lógico, técnicas superiores de gestión de grupos y dominio de la metodología de RCA. Deben crear una dinámica eficaz del equipo de RCA asegurando que todos los participantes del equipo sean escuchados, manteniendo a las personalidades dominantes bajo control y evitando problemas en el grupo. Las responsabilidades técnicas incluyen asegurarse de que la metodología RCA se siga con exactitud, lo que incluye:

  • Crear una definición clara y precisa del problema
  • Garantizar que todas las causas de incidentes se identifiquen y documenten en las relaciones causales apropiadas
  • Fomentar el pensamiento “fuera de la caja” de los miembros del equipo durante el proceso de identificación de la solución
  • Resumir los resultados del RCA con las soluciones recomendadas para la presentación al personal adecuado


El Participante de RCA

La función principal de El Participantes es ser un miembro del equipo de RCA cuando se le solicite. Puede ser cualquiera de los empleados por hora, los gerentes de nivel superior o cualquier otra posición. Los participantes necesitan recibir un nivel de entrenamiento en la metodología de RCA de tal manera que entiendan la terminología y los pasos involucrados. Aunque el nivel de formación es significativamente menor que el de los Facilitadores, los participantes a menudo constituyen la mayor parte de los alumnos debido a la diversidad de los requisitos de personal para los diversos incidentes de RCA. Las responsabilidades a menudo incluyen lo siguiente:

  • Mantener una comprensión fundamental de la metodología de RCA empleada
  • Participar como miembro del equipo de RCA cuando se le solicite
  • Aplicar sus propios conocimientos específicos, habilidades y experiencia para identificar causas de incidentes
  • Ayudar en la identificación y recolección de evidencia del incidente como se indica
  • Participar en el proceso de identificación de soluciones


Los Primeros Socorristas

El papel de un Primer Socorrista es reunir información y conservar tanta evidencia como sea posible cuando ocurre un incidente. Este papel es especialmente importante para las operaciones de manufactura 24/7. Dado que el tiempo de respuesta es a menudo crítico para preservar la evidencia, es importante que los Primeros Socorristas estén disponibles para servir en esta función. Para tales operaciones, típicamente los Primeros Socorristas son supervisores, personal de piso, personal de mantenimiento, etc. Las actividades de los Primeros Socorristas incluyen principalmente:

  • Notificar al personal apropiado que ha ocurrido un evento que active un RCA
  • Recopilar información y evidencia del incidente para el equipo de RCA la cual puede incluir:
    • Tomar fotografías si es necesario
    • Identificar testigos oculares para ser entrevistados por el equipo de RCA
    • Recopilar los datos electrónicos relacionados con el incidente en cuestión
    • Preservar cualquier evidencia física relacionada al incidente

Al comprender las responsabilidades de cada uno de estos roles, se puede asegurar que el personal apropiado sea asignado a las funciones adecuadas, mientras que al mismo tiempo, se equilibren los cargos existentes con cualquier responsabilidad adicional de RCA.


Hasta el momento, esta serie de blogs ha cubierto:

Los Pasos Clave para Diseñar Su Programa

Definición de Metas y Estado Actual

RCA y Seguimiento de Soluciones y Roles y Responsabilidades

Estructura Recomendada del Equipo RCA.

Y, Descripciones de Cada uno de los Seis Roles Dentro de un Programa de RCA.

Siguiente – Estrategia de Capacitación y Supervisión y Gestión de Esfuerzos de RCA. Manténgase sintonizado para más.



Author: Amir Datoo

Microsoft Excel is an amazing tool. Yet it has its limitations and flaws for engineers who aren’t trained in computer programming.

The main problem with Excel for managing maintenance programs is a simple one, yet it’s largely unavoidable. It’s called human error. No matter how fastidious you are when creating a spreadsheet, a single line of data that is entered incorrectly—or, worse, an inaccurate user-defined formula—can have huge implications down the track.  bigstock--124618859

In fact, a study by Raymond Panko has found that 88% of spreadsheets contain errors. He warns:

These error rates are completely consistent with error rates found in other human activities. With such high cell error rates, most large spreadsheets will have multiple errors, and even relatively small “scratchpad” spreadsheets will have a significant probability of error.

When it comes to maintenance, these small errors can quickly add up.

Think of a multi-million-dollar maintenance project. A maintenance manager unwittingly enters a few incorrect cost estimates. Decisions are made based on the calculations resulting from this incorrect data, and machinery is not maintained when it should be.

Or, the equation for failure probability is not quite right. According to the spreadsheet, a major piece of equipment isn’t likely to fail anytime soon, so you delay maintenance. Whoops. The equipment fails and the whole plant needs to be shut down. The downtime costs tens of thousands a day.

Yes, Excel can be used to create links between different sheets, develop hierarchical relations and create simple pivots. It can even run complex Monte Carlo simulations for determining probabilistic likelihoods of asset failure. It’s flexible and easily adaptable. But can your organization afford the risk of compounding errors due to incorrectly entered data or a flawed formula?

Making sense of work management

As any maintenance engineer or manager will know, work management is a critical piece of the maintenance puzzle. It’s all about evaluating your equipment, deciding what you need to do with it, scheduling the work in, completing the work and finally reviewing your actions.

You’d be hard-pressed to find an organization that doesn’t have a good work management process in place. And a raft of enterprise software systems exist to help manage the activity (think SAP PM or Maximo).

Yet these enterprise systems fall down in one crucial area: Asset Strategy Management. Reliability analysis is not built into the tools, and so organizations fall back on spreadsheets to manage things like predictive failure analysis, failure mode effects analysis and reliability simulations.

The good news? Implementing an Asset Strategy Management (ASM) solution removes the inconsistent outcomes from asset strategies and drives continuous reliability improvement. Asset Strategy Management helps to answer the ‘what’ and ‘when’ of maintenance, and is proving to save money, dodge downtime and improve overall business performance.

Key benefits of Asset Strategy Management

The use of an enterprise ASM solution over spreadsheets offers huge value to any organization.

First, as a structured solution, you know that it has gone through rigorous rounds of testing by experienced programs. Formula errors simply don’t exist.

What about human error? An ASM solution helps you avoid user input errors through data validation and verification. You set up business rules and logic that immediately flags if an error has been made. For example, there’s a common field called a ‘system condition’. You can set the field as mandatory—a user must enter a number to progress to the next field. You can even stipulate what number/s it can be. Competitive Advantage in a Business Competition Environment 3D I

ASM delivers huge efficiency gains.  We have seen it take almost three years to develop a reliability management strategy using Excel spreadsheets. Using an enterprise ASM software tool, complex reliability strategies were up and running in six months.

Efficiency is also found in the reduction of the number of files being used. If you’re using spreadsheets to manage maintenance schedules, it’s common to have a different spreadsheet at each site. A change that needs to be deployed globally requires huge effort and carries risk of error. When data is consolidated into one ASM system, changes can be made singularly and globally. Reliability studies seamlessly interact with the CMMS without version issues and/or loss of data. Perhaps the most significant benefit of an ASM solution is its ability to facilitate risk-based decision making. Spreadsheets do no provide real-time analytics to guide informed decisions. With the right Asset Strategy Management system in place, all the key metrics you need to make those business-critical decisions that could make or break your business are at your fingertips.

To learn more about Asset Strategy Management watch this webinar on-demand “Harnessing Technology, Innovation, and Big Data to Reshape Asset Strategy Management and Unlock Unrealized Value.”

Author: Jason Apps

Use the content and equipment expertise you already have to drive performance improvement

Do you get the sense that your organization is unable to deploy the best maintenance strategies to all assets, at all times? Do you suspect that money is being wasted through ineffective strategies? An Asset Strategy Management process could be just what your organization needs.

In short, Asset Strategy Management means that:

  1. The best strategies, developed by your best subject matter experts, are in place; and
  2. They are deployed to all your assets all the time; and
  3. They continually evolve based on real data and an effective review process

It unlocks value currently being left on the table through ineffective strategies and the inability to deploy the best tactics to all assets in moments.

What is Asset Strategy Management?

Most organizations have attempted, at least in part, to standardize Master Data and even strategies for common equipment. It makes logical sense to consolidate and deploy common data wherever relevant.

Yet there are two common problems holding organizations back:

  1. Creating and deploying generic content cannot be done effectively within a CMMS or ERP system. These systems are designed to support the execution of work; not the management of strategy decisions. By their very nature, they cannot truly utilize generic content in a continuously deployable and connected way.
  2. While there may be a sound, defined work management process in place to drive consistent execution of work, there is limited or no process in place to manage the review and evolution of strategies and content. Quite simply, parameters associated with the strategy can be changed on a whim with no requirement for subject matter involvement or approval.

Essentially, most organizations have not separated work management and strategy management – yet they are entirely different processes with completely different objectives.

Work Management = managing execution of work

Strategy Management = managing the strategy that will be executed

So what does strategy management cover? ASM Cog

  • Tactical:  The maintenance tactics that will be executed. Including the tasks to be performed, when they are done, how they are done, who does them, materials required.
  • Asset/Fleet: The decisions made at an asset level such as major component or asset replacement ages, major shutdown or system outage schedules.
  • Portfolio: Optimization of budget allocation for a portfolio to maximize value given the financial and resource constraints.

In many cases, there is an iteration whereby constraints at a portfolio level drive the need to change tactical level strategy to deliver the required performance with the available funds.

ASM ComponentsThe ideal situation

This environment – where strategy management is separated from both work management and performance management, where it is implemented – allows for management of generic content, rapid deployment, and intelligent strategies that continually learn from your best decisions no matter where they are made.

Your subject matter experts can develop a strategy for an equipment type, and then rapidly deploy the strategy to all relevant assets. When a change is made to one instance of that particular equipment type, you can see exactly where else it is implemented – so that maintenance plans can be updated in the CMMS, across the whole asset base if needed.  Caution

It is critical to note that Asset Strategy Management is not:

  1. Just an FMEA library
  2. Just a maintenance tactic library
  3. Just a project to review or develop maintenance tactics

Rather, it is a process that continually manages asset strategies over time. It delivers the required performance and allows you to effectively manage and deploy generic maintenance plans at a speed that matches the decision making.

Of course, for the process to work, Asset Strategy Management allows for local variations of content to account for different operating contexts or duties, environments, local workforces or regulations – while maintaining the link to generic content for rapid deployment of the latest thinking in the future.

What’s required for Asset Strategy Management?

Like all effective workflows, Asset Strategy Management needs the right infrastructure in place. You need:

  • A clearly defined process, with roles and accountabilities outlined
  • The right technology to identify underperforming assets and implement appropriate solutions using data-driven insights
  • A strategy for educating all people involved in every step of the process
  • Support mechanisms
  • Effective triggers
  • An Asset Strategy Management solution

But get it right and the results speak for themselves. With Asset Strategy Management, you will realise significant cost savings by deploying your best strategies to your entire asset base, all the time.

This is a guest post written by Copperleaf.  ARMS Reliability is an authorised distributor of Copperleaf’s C55 Asset Investment Planning & Management solution. 

Author: Barry Quart – Copperleaf, VP of Marketing

Close up of hand of man playing chess holding queen. Business ma

In any discussion about asset management these days, the ISO 55000 standard is bound to come up. ISO published the standard in 2014 to provide guidance on best-in-class asset management practices and help organisations “realise the maximum value from their assets.”

In a nutshell, it’s about choosing the ‘right’ things to invest in—the projects that will deliver the highest value, and are most aligned with your company’s strategy.

It’s also about creating a plan—a roadmap for success—laying out what will be done, when, by whom and how it will be evaluated. The plan must address how to keep assets operating at their optimal level of performance, while managing risk, and respecting the available budgets and resources. Goal Wish

Sounds simple but this is no easy task, especially in organisations with tens of thousands, or even millions of diverse assets.

Asset Investment Planning & Management (AIPM) is an evolving discipline that helps organisations focus their available resources on doing the right things at the right time. AIPM can help you:

  • PREDICT the long-term needs of your asset base
  • OPTIMISE portfolios of investments to realize the greatest value from your assets
  • MANAGE your portfolios to achieve the highest execution performance

When these three principles of AIPM are put in place, organisations can start to make these complex investment decisions with confidence. AIPM

PREDICT:  Asset managers must focus on predicting the needs of their corporation’s assets, and on developing a realizable investment strategy to meet those needs.  The key word here is realisable. It’s not just about identifying the ideal thing to do for every asset, because you invariably won’t be able to afford to do every “ideal” thing you are asked to. You need to propose a strategy that you can afford, and have adequate resources to carry out. This is where the second part of the strategy comes in.

OPTIMISE:  If your investment requests exceed your available budget and/or resources, you need to develop a plan that delivers the most value for the money and resources you do have. When you can’t do it all, you need to consider deferring some investments and/or evaluate alternative ways to address the needs identified above. Value-based decision making can help you make the difficult trade-offs between risk, cost, and performance, and ensure that for your available funding and resources, you are always executing a plan that delivers the maximum value from your assets.

MANAGE:  Even the best plans never execute as expected. Emergent work, delays, and cost overruns all affect your organisation’s ability to deliver on the original set of objectives. Actual spend and accomplishments should be compared to the original plan, variances explored, and the plan re-optimised to ensure that looking forward, the organisation is always focused on those activities that deliver the highest value. This process of continuous planning is an integral part of a best-in-class asset investment strategy.

AIPM can help you make higher value investment decisions, and justify those decisions to stakeholders. Learn more about how AIPM supports the ISO 55000 standard.