Monthly Archives: December 2016

You are browsing the site archives by month.

bigstock-Construction-Worker-Falling-Of-68401633_Filters.jpgWithout truly understanding the key elements (and possessing the necessary skills) to conduct a thorough, effective investigation, people run the risk of missing key causal factors of an incident while conducting the actual analysis. This could potentially result in not identifying all possible solutions including those that may be more cost effective, easier to implement, or more effective at preventing recurrence.

Here we outline the 5 key steps of an incident investigation which precede the actual analysis.

1. Secure the incident scene

  • Identify and preserve potential evidence
  • Control access to the scene
  • Document the scene using your ‘Incident Response Template’ (Do you have one?)

2. Select investigation team

  • The functions that must be filled are:
  • Incident Investigation Lead
  • Evidence Gatherer
  • Evidence Preservation Coordinator
  • Communications Coordinator
  • Interview Coordinator

Other important considerations for the selection of team members include:

  • Ensure team members have the desirable traits (What are they?)
  • The nature of the incident (How does this impact team selection?)
  • Choose the right people from inside and outside the organization (How do you decide?)
  • Appropriate size of the team (What is the optimum team size?)

*Our Incident Investigator training course examines each of these considerations and more, giving you the knowledge to select investigation team members wisely.

3. Plan the investigation

Upon receiving the initial call:

  • Get the preliminary What, When, Where, and Significance
  • Determine the status of the incident
  • Understand any sensitivities
  • If necessary and appropriate, issue a request to isolate the incident area
  • Escalate notifications as appropriate

The preliminary briefing:

  • Investigation Lead to present a preliminary briefing to the investigating team
  • Prepare a team investigation plan

4. Collect the facts supported by evidence


  • Be prepared and ready to lead or participate in an investigation at all times to ensure timeliness and thoroughness.
  • Have your “Go Bag” ready with useful items to help you secure the scene, take photographs, document the details of the scene and collect physical evidence.
  • Collect as much information as possible…analyze later
  • Inspect the incident scene
  • Gather facts and evidence
  • Conduct interviews

*While every step in the Incident Prevention Process is crucial, step 4 requires a particularly distinct set of skills. A lot of time in our Incident Investigator training course is dedicated to learning the techniques and skills required to get this step done right.

5. Establish a timeline

This can be the quickest way to group information from many sources


  • Stickers can be used on poster paper to start rearranging information on a timeline. Use different colors for precise data versus imprecise, and list the source of the information on each note.

After steps 1-5 comes the Root Cause Analysis of the incident, solution implementation and tracking, and reporting back to the organization:

6. Determine the root causes of the incident

7. Identify and recommend solutions to prevent recurrence of similar incidents

8. Implement the solutions

9. Track effectiveness of solutions

10. Communicate findings throughout the organization

*Steps 6-10 are taught in detail at our Root Cause Analysis Facilitator training course.

To learn more on the difference between our Incident Investigator versus RCA Facilitator training courses, check out our previous blog article and of course, if you would like to discuss how to implement or improve your organization’s incident prevention process, please contact us.

bigstock--136958450.jpgAuthor: Bruce Ballinger


To have a successful implementation and adoption of your new RCA program, it’s crucial to have all the elements of an effective and efficient program clearly identified and agreed upon in advance.

Here’s a high-level look at the elements that will need to be defined:

  1. RCA Goals and Objective Alignment
  • Define the goals and objectives of the program and assure they are in alignment with corporate/facility/department goals and objectives
  • Status of Current RCA Effort
  • Perform a maturity assessment of existing RCA program to be used as a baseline to measure future improvements
  • Key Performance Indicators
  • Identify KPI’s with baselines and future targets to be used for measuring progress towards meeting program goals and objectives
  • Formal RCA Threshold Criteria
  • Determine which incidents will trigger a formal RCA and estimate how many triggered events may occur in the upcoming year
  • RCA and Solution Tracking Systems
  • Identify which internal tracking systems will be used to track status/progress of open RCA’s and implemented solutions
  • Roles and Responsibilities
  • Identify specifically who will have a role in the RCA effort including, program sponsor, champion, RCA facilitators
  • Training Strategy
    • Determine who will be trained in the chosen RCA methodology and to what level and in what time frame
  • RCA Effort Oversight and Management
    • Identify who (or what committees or groups) will be responsible for managing tracking systems, decisions on solution implementation, program modifications over time, and general program performance
  • Process Mapping
    • Process mapping exercise to document RCA management from the beginning of a triggered incident to completion of implemented solutions, including their impact on organization’s goals and objectives.
  • Human Change Management Plan
    • Develop a Change Management plan, including a detailed communication plan, that specifically targets those whose job duties will be affected by the RCA effort.
  •  Implementation Tracking
    • Create a checklist to monitor RCA effort implementation including action items, responsible parties and due dates

We recommend conducting a workshop in order to define each of these crucial elements of your RCA program.

The workshop should be conducted for what we call a “functional unit” which ideally is no larger than a plant or facility, however, it can be modified to accommodate multiple facilities.

Common elements of a functional unit include:

  • A common trigger diagram
  • Common KPI’s
  • The same Program Champion
  • Members have an interdependence and shared responsibility on one another for functional unit performance

By structuring programs to fit within the goals and objectives of the business, or “functional unit”, rather than applying a ‘one size fits all’ solution, effective and long lasting results can be realized.

Implementing a new RCA program or need to reinvigorate your current one? ARMS can help you create a customized plan for its successful adoption. Contact Us for more information

Author: Scott Gloyna

For any given asset there are typically dozens of different predictive or preventive maintenance tasks that could be performed, however selecting the right maintenance tasks that contribute effectively to your overall strategy can be tricky, The benefit is the difference between meeting production targets and the alternative of lost revenue, late night callouts, and added stress from unplanned downtime events. Construction Worker Pointing With Finger. Ready For Sample Text

Step 1: Build out your FMEA (Failure Mode Effects Analysis) for the asset under consideration. 

Make sure you get down to appropriate failure modes in enough detail so that the causes are understood and you can identify the proper maintenance to address each specific failure mode.

Once you’ve made a list of failure modes, then it’s detailed analysis time. If you want to be truly rigorous, perform the following analysis for every potential failure mode. Depending on the criticality of the asset you can simplify by paring down your list to include only the failure modes that are most frequent or result in significant downtime.

Step 2: Identify the consequences of each failure mode on your list.

Failure modes can result in multiple types of negative impact. Typically, these failure effects include production costs, safety risks, and environmental impacts. It is your job to identify the effects of each failure mode and quantify them in a manner that allows them to be reviewed against your business’s goals. Often when I am facilitating a maintenance optimization study people will say things like “There is no effect when that piece of equipment fails.” If that’s the case, why is that equipment there? All failures have effects, they may just be small or hard to quantify, perhaps because of available workarounds or maybe there is a certain amount of time after the failure before an effect is realized.

Step 3: Understand the failure rate for each particular mode.

Gather information on the failure rates from any available industry data and personnel with experience on the asset or a similar asset and installation, as well as any records of past failure events at your facility. This data can be used to evaluate the frequency of failure through a variety of methods — ranging from a simple Mean Time To Failure (MTTF) to a more in-depth review utilizing Weibull distributions.

(Note: The Weibull module of Isograph’s Availability Workbench™ can help you to quickly and easily understand the likelihood of different failure modes occurring.)

Step 4: Make a list of possible reactive, planned or inspection tasks to address each failure mode.

Usually, you start by listing the actions you take when that failure mode occurs (reactive maintenance). Then broaden your list to any potential preventive maintenance and/or inspection tasks that could help prevent the failure mode from happening, or reduce the frequency at which it occurs.

  • Reactive tasks
    • Replacement
    • Repair
  • Preventive tasks
    • Daily routines (clean, adjust, lubricate)
    • Periodic overhauls, refurbishments, etc.
    • Planned replacement
  • Inspection tasks
    • Manual (sight, sound, touch)
    • Condition monitoring (vibration, thermography, ultrasonics, x-ray and gamma ray)

Step 5: Gather details about each potential task.

In order to compare and contrast different tasks, you have to understand the requirements of each:

  • What exactly does the task entail? (basic description)
  • How long would the work take?
  • How long would it take to start the work after shutdown/failure?
  • Who would do the work?
  • What labor costs are involved? (the hourly rates of the employees or outside contractors who would perform the task)
  • Would any spare parts be required? If so, how much would they cost?
  • Would you need to rent any specialized equipment? If so, how much would it cost?
  • Do you have to take the equipment offline? If so, for how long?
  • How often would you need to perform this task (frequency)?

A key consideration for inspection tasks only: What is the P-F interval for this failure mode? This is the window between the time you can detect a potential failure (P) and when it actually fails (F) — similar to calculating how long you can drive your car after the fuel light comes on, before you actually run out of fuel Understanding the P-F interval is key in determining the interval for each inspection task.

The P-F interval can vary from hours to years and is specific to the type of inspection, the specific failure mode and even the operating context of the machinery.

It can be hard to determine the P-F interval precisely but it is very important to ensure that the best approximation is made because of the impact it has on task selection and frequency.

Step 6: Evaluate the lifetime costs of different maintenance approaches.

Once you understand the cost and frequency of different failure modes, as well as the cost and frequency of various maintenance tasks to address them, you can model the overall lifetime costs of various options.

For example, say you have a failure mode with a moderate business impact — enough to affect production, but not nosedive your profits for the quarter. If that failure mode has a mean time between failures (MTBF) of six months, you might take a very aggressive maintenance approach. On the other hand, if that failure mode only happens once every ten years, your approach would be very different. “Run to Failure” is often a completely legitimate choice, but you need to understand and be able to justify that choice.

These calculations can be done manually, in spreadsheets or using specialized modeling software such as the RCMCost™ module of Isographs Availability Workbench™.

Ultimately you try to choose the least expensive maintenance task that provides the best overall business outcome.

 Ready to learn more? Gain the skills needed to develop optimized maintenance strategies through our training course: Introduction to Maintenance Strategy Development


Actualmente, es probable que muchas organizaciones de nivel empresarial tengan operaciones similares en múltiples ubicaciones a nivel regional o incluso mundial. Cuando una pieza de un activo falla o se produce un incidente de seguridad en un sitio, la compañía investiga el problema e identifica soluciones o acciones correctivas. Naturalmente, el equipo quiere capturar las lecciones aprendidas y compartirlas con otros sitios que tienen equipos similares y procesos e incidentes potenciales. investigation files.jpg

Herramientas avanzadas como el software RealityCharting® permiten a los equipos compartir los resultados de un Apollo Análisis Causa Raíz (RCA) con múltiples miembros de su equipo. Sin embargo, una empresa multinacional podría tener decenas de investigaciones simultáneas. En los cargos más altos, los que toman las decisiones no necesariamente quieren ver información detallada sobre causas específicas en una planta. Necesitan una perspectiva mas general de los problemas y patrones que afectan a toda la organización. 

En ARMS Reliability, muchos de nuestros clientes han expresado una necesidad similar. ¿Nuestra solución? Utilizar etiquetas de clasificación para crear y aplicar una taxonomía coherente a todos los análisis de causa raíz realizados para una organización determinada. En un informe compuesto, estas etiquetas revelan las tendencias y problemas de toda la empresa, lo que permite a la administración crear planes de acción para abordar estos problemas sistémicos. Por ejemplo, las etiquetas de clasificación pueden revelar un gran número de problemas relacionados con la falta de mantenimiento preventivo en un cierto tipo de bomba o un incumplimiento sistémico con un proceso de seguridad requerido.

Una taxonomía de clasificación puede ser moldeada y configurada a los objetivos y procesos de una organización. Piense en éstas como clasificaciones que pueden aplicarse en cualquier nivel de la RCA por ejemplo, a las causas raíz o soluciones, a causas contribuyentes individuales, o simplemente a la investigación de RCA en general.

Tenga en cuenta lo siguiente: El método Apollo de análisis causa raíz se centra alrededor de un enfoque de pensamiento libre para la resolución de problemas. Eso es lo que hace que la metodología sea tan poderosa: no te lleva a ninguna vía genérica predeterminada haciendo preguntas directas o categorizando varias causas o efectos de ninguna manera. En ARMS Reliability, abogamos por la aplicación de etiquetas de clasificación una vez que la investigación de análisis causa raíz se ha completado, por lo que se mantiene el pensamiento libre de análisis causal para organizarlo posteriormente, con el fin obtener resultados con una visión sistémica más profunda.

Las taxonomías pueden variar desde 5 a 20 categorías hasta los cientos. Por ejemplo, aquí hemos utilizado una taxonomía sobre factores humanos para etiquetar causas como influencias organizacionales y otras cuestiones relacionadas a la persona.

screenshot 1.png

Los informes pueden proporcionar un resumen de cuántas causas se clasificaron bajo las distintas etiquetas:

screenshot 2v2.jpg

En otro ejemplo, una organización basa su taxonomía de problemas de confiabilidad en el ISO 14224 – Recopilación e intercambio de datos de confiabilidad y mantenimiento para activos.

Las opciones de taxonomía son infinitas. La mayoría de las organizaciones con las que trabajamos tienen sus propios sistemas de clasificación. Realmente se trata de codificar los tipos de información que su organización necesita capturar primero.

Si cree que la adición de clasificaciones a su análisis de causa raíz sería útil para su organización, póngase en contacto con ARMS Reliability. Nos gustaría mostrarle más sobre lo que estamos haciendo con otros clientes y ayudarle a desarrollar una taxonomía que funcione de acuerdo a sus necesidades.