Author: Jason Ballentine

Developing a maintenance strategy requires careful consideration and due process. Yet from what I’ve seen, many organizations are making obvious errors right from the start — missteps that can torpedo the success of the strategies they’re trying so hard to put in place.??????????????????????????????????????????

Without further ado, here are five common maintenance strategy mistakes:

  1. Relying solely on original equipment manufacturer (OEM) or vendor recommendations.

It seems like a good idea — you’d think the people who made or sold the equipment would know best. It’s what they don’t know that can hurt you.

Outside parties don’t know how a piece of equipment functions at your facility. They don’t understand how much this equipment is needed, the cost of failure, whether there’s any redundancy within the system… OEM and vendor maintenance guidelines are geared to maximize the availability and reliability of the machine, but their strategies might not be appropriate for your unique circumstances or needs. As a result, your team could end up over-maintaining the equipment, which can actually create more problems than it solves. The more you mess with a piece of equipment, the more you introduce the possibility of error or failure. Some things, in some situations, are better left alone.

What’s more, OEMs and vendors have a vested interest in selling more spare parts (so they can make more money). That means that their replacement windows might not be accurate or appropriate to your business needs. Rather than relying on calendar-driven replacement, your maintenance strategy might focus more on inspecting the equipment to proactively identify any issues or deterioration, then repairing or replacing only as needed.

It’s fine to use OEM/vendor maintenance guidelines as a starting point. Just make sure you thoroughly review their recommendations to see if they align with your unique needs for the given piece of equipment. Don’t just blindly accept them — make sure they fit first.

  1. Relying heavily on generic task libraries for your maintenance strategy.

This is surprisingly common. Some organizations purchase a very generic set of activities for a piece of equipment or equipment category, and attempt to use them to drive maintenance strategy. But generic libraries are even worse than OEM/vendor recommendations because they are just that — generic. They aren’t written for the specific equipment make and model you have. They might even include tasks that simply don’t apply, such as “inspect the belt” on a pump that uses an entirely different drive mechanism. Once a mechanic attempts to perform one of these generic, ill-suited tasks, he or she stops trusting your overall maintenance strategy. Without credibility and compliance, you might as well not have a strategy at all.

Like OEM and vendor recommendations, generic task libraries can help you get started on a robust maintenance strategy, if (and only if) you carefully examine them first and only use the tasks that make sense for your particular equipment and operational needs.

  1. Failing to include a criticality assessment in your strategy decisions.

If you choose and define tasks without factoring in criticality, you run the risk of wasted effort and faulty maintenance. Think about it: If a piece of equipment is low on the criticality scale, you might be okay to accept a generic strategy and be done with it. But for equipment that’s highly critical to the success of your operations, you need to capture as much detail as possible when selecting and defining tasks. How can you know which is which without fully assessing the relative importance of each piece of equipment (or group of equipment) to the overall performance of your site?

  1. Developing maintenance strategies in a vacuum.

Sometimes, organizations will hire an outside consultant to develop maintenance strategies and send them off to do it, with no input from or connection with the maintenance team (or the broader parts of the organization). Perhaps they figure, “you’re the expert, you figure it out.” Here’s the problem: For a maintenance strategy to be successful, it must be developed within the big picture. You’ve got to talk to the mechanic who’ll be doing the work, the planner for that work, and the reliability engineer who’ll be responsible for the performance of that equipment, production, or operation. Their input is extremely valuable, and their buy-in is absolutely critical. Without it, even the best maintenance strategy can be met with resistance and non-compliance.

  1. Thinking of maintenance strategy development as a “one-and-done” effort.

For some organizations, the process of developing a maintenance strategy from the ground up seems like something you do once and just move on. But things change — your business needs change, the equipment you have on site changes, personnel changes, and much more. That’s why it’s vitally important to keep your maintenance strategies aligned with the current state of your operations.

In fact, a good maintenance strategy is built with the idea of future revisions in mind. That means the strategy includes clear-cut plans for revisiting and optimizing the strategy periodically. A good strategy is also designed to make those revisions as easy as possible by capturing all of the knowledge that went into your strategy decisions. Don’t just use Microsoft Word or put tasks directly into the system without documenting the basis for the decisions you made. What were your considerations? How did you evaluate them? What ultimately swayed your decision? In the future, if the key factors or circumstances change, you’ll be able to evaluate those decisions more clearly, without having to guess or rely on shaky recall.

If you’ve found yourself making any of these mistakes, don’t despair. Most errors and missteps can be addressed with an optimization project. In fact, ARMS Reliability specializes in helping organizations make the most of their maintenance strategies. Contact us to learn more.

Engineering Support_Web Banner

bigstock--165000134.jpgAs outlined in our previous blog article, “RCA Program Development: The Key Steps of Designing Your Program”, there are 11 key steps to a successful RCA program. Last month we introduced the first two steps – Defining Goals and Current Status. In this article we’ll break down steps 3 and 4 – Setting KPIs for your RCA program and establishing trigger thresholds to initiate an RCA.

  1. Key Performance Indicators

Key Performance Indicators, or KPIs, are the benchmarks used to measure the success of a program or effort. They can generally be divided into two categories: leading indicators and lagging indicators.  Both of these measure the degree to which progress is being made in achieving a specific goal.  Leading indicators tend to be objectives that progress you towards achieving the ultimate goal. They can be measured over a short period and act as mileposts to gauge how you’re tracking towards your goal. Lagging indicators are often the goals themselves.  If the relationship between the two is correctly defined, then achieving the short-term (leading) indicators virtually guarantees achieving the long-term goals.

To provide perspective in measuring progress using KPIs, a baseline must first be established.  Baselines for the selected KPIs should be at least 3 years of historical performance. Once these are established, then goals or targets for improvement should be set for a period of time, say 3 years, going forward. This process should be reviewed at least annually with baselines and targets adjusted accordingly.

  1. Formal RCA Threshold Criteria

An effective incident prevention program will have RCAs being performed at two levels: 1) On an informal or ad hoc basis for smaller, nuisance-level problems that may be specific to individuals or departments; and 2) on a formal level where challenges to the organization’s goals exist.

Leaders must communicate the organizational trigger criteria but they should also encourage and support teams and individuals to set their own trigger criteria as well.  When your employees can solve smaller day-to-day problems more effectively, your organization will realize the benefits of pro-active problem solving because many smaller problems will be rectified before they can manifest themselves into larger organization-level problem.

For RCA to be a core competency at all levels of the organization, and for people to be proactively preventing organizational problems, it is important to have clear guidance for formal RCAs. This is the function of the Trigger Criteria diagram. High-level challenges should be formally identified and assigned a threshold that when exceeded will automatically trigger a formal RCA.  Triggers should generally be leading indicators of some form or another and derived from specific organizational goals, or KPIs.  They are the trip wires to engage the RCA process for finding solutions to problems that are inhibiting organization goal achievement.

Organizations at higher levels of maturity will most often have triggers for multiple categories including safety, environmental compliance, revenue loss, unbudgeted costs, production loss, and sometimes repeat incidents.

For a deeper dive into the topic of trigger thresholds and scaling your RCA investigation, check out our whitepaper “Matching the Scale of Your RCA Investigation to the Significance of the Incident.

In this blog series, we’ve now covered:

And

  • Setting KPIs and Establishing Trigger Thresholds

But of course, there is more to setting up your RCA program for success. ARMS Reliability’s RCA experts can assist you with designing your complete RCA program or reinvigorating your current one. This of course includes assisting with determining the status of your current RCA effort, walking you through the process of establishing and aligning goals, helping you set KPIs for your program, and establishing trigger thresholds that make sense for your organization. Learn more about our recommended facilitated workshop that covers all 11 of the key steps, and contact us for more information.

 

Stay tuned for the next article in this series.

 

Autor: Bruce Ballinger

bigstock--136958450.jpgPara tener éxito en la implementación y adopción de su nuevo programa de ACR, es crucial tener todos los elementos de un programa eficaz y eficiente claramente identificados y acordados de antemano. 

He aquí un vistazo de alto nivel a los elementos que necesitarán ser definidos:

1. Alineación de Metas y Objetivos del ACR

  • Definir las metas y objetivos del programa y asegurar que están alineados con las metas y objetivos de la empresa/instalación departamento.

2. Estado del Esfuerzo Actual en ACR

  • Realizar una evaluación de la madurez del programa ACR existente que se utilizará como línea de base para medir las mejoras futuras.

3. Indicadores Clave de Desempeño

  • Identificar los indicadores clave de desempeño con líneas de base y objetivos futuros que se utilizarán para medir el progreso hacia el logro de los metas y objetivos del programa.

4. Criterios Formales de Límite ACR

  • Determinar qué incidentes dispararán un ACR formal y estimar cuántos eventos disparadores pueden ocurrir en el próximo año.

5. ACR y Sistemas de Seguimiento de Soluciones

  • Identificar qué sistemas de seguimiento internos se utilizarán para monitorear el estado/progreso de los ACR´s abiertos y las soluciones implementadas.

6. Roles y Responsabilidades

  • Identifique específicamente quién tendrá un papel en el esfuerzo de ACR incluyendo, patrocinador del programa, campeón, facilitadores de ACR.

7. Estrategia de Capacitación

  • Determinar quién será entrenado en la metodología ACR elegida y a qué nivel y en qué plazo.

8. Gestión y Supervisión del Esfuerzo ACR

  • Identificar quiénes (o qué comités o grupos) serán responsables de administrar los sistemas de seguimiento, las decisiones sobre la implementación de las soluciones, las modificaciones del programa a lo largo del tiempo y el desempeño general del programa.

9. Mapeo de Procesos

  • Proceso de mapeo para documentar la gestión de ACR desde el inicio de un incidente disparado hasta la finalización de soluciones implementadas, incluyendo su impacto en las metas y objetivos de la organización.

10. Plan de Gestión del Cambio Humano

  • Desarrolle un plan de Gestión del Cambio, incluyendo un plan detallado de comunicación, que específicamente apunta a aquellos cuyas tareas laborales serán afectadas por el esfuerzo de ACR.

11. Seguimiento de la Implementación

  • Crear una lista de verificación para monitorear la implementación del esfuerzo ACR, incluyendo elementos de acción, partes responsables y fechas de vencimiento.

Recomendamos la realización de un taller para definir cada uno de estos elementos cruciales de su programa ACR. 

El taller debe ser conducido para lo que llamamos una “unidad funcional” que idealmente no es mayor que una planta o instalación, sin embargo, puede ser modificada para acomodar múltiples instalaciones.

Los elementos comunes de una unidad funcional incluyen:

  • Un diagrama disparador común
  • Indicadores clave de desemepeño comunes
  • El mismo Campeón del Programa
  • Los miembros tienen una interdependencia y una responsabilidad compartida entre ellos para el desempeño de la unidad funcional

Al estructurar los programas para que se ajusten a las metas y objetivos del negocio, o “unidad funcional”, en lugar de aplicar una solución única, se pueden obtener resultados efectivos y duraderos.

¿Implementar un nuevo programa ACR o necesidad de revitalizar el actual? ARMS puede ayudarle a crear un plan personalizado para su adopción con éxito. Póngase en contacto con nosotros para obtener más información.

RCA201_At A Glance.jpg

training_footer_ad1-resized-600.jpg

Author: Jason Ballentine

Many organizations believe that making sound maintenance decisions requires a whole lot of data. It’s a logical assumption — you do need to know things like the number of times an event has occurred, its duration, the number of spare parts needed, and the number of people engaged in addressing the event; plus the impact on the business and the reason why it happened. ????????????????????????????????????????

A lot of this information is captured in your Computerized Maintenance Management System (CMMS). The more detail you have, the more accurate results you can get from maintenance scenario simulation tools like Isograph’s Availability Workbench™. Unfortunately, your CMMS data may be lacking enough detail to yield optimal results.

It’s enough to make anybody want to throw his or her hands up and put off the decision indefinitely. If you do, you could be making a big mistake.

No matter what, you’re still going to have to make a decision. You have to.

The truth is, you can still do a lot with limited or poor quality data, supported by additional sources of knowledge. Extract any and all information you have available, not just what is in the CMMS. Document what you’ve got, then use it to make a timely decision that’s as informed as possible.

Don’t get caught up in the fact that it’s not perfect data — circumstances in the real world are hardly ever ideal. In fact, as reliability engineers, most of the data we get is related to failure, which is exactly what we’re trying to avoid. Actually, if we are tracking failures, having less data means we are likely doing our jobs well because that means we are experiencing a low number of failures.

The bottom line is: we can’t afford to sit and wait for more data to make decisions, and neither can you.

Gather as much information as you can from all available sources:

CMMS

In an ideal world, this is the master data record of all activities performed.  As discussed previously, that is almost never the case; however, this is an important starting point to reveal where data gaps exist.

Personal experience and expertise

There’s a wealth of information stored within the experience of people who are familiar with any given piece of equipment. Consider holding a facilitated workshop to gather insight on the equipment’s likely performance. Even a series of informal conversations can yield useful opinions and real-world experiences.

The Original Equipment Manufacturer (OEM)

Most OEMs will have documentation you can access, possibly also a user forum you can mine for additional information.

Industry databasese.g., the Offshore and Onshore Reliability Data Handbook (OREDA) and Process Equipment Reliability Database (PERD) by Center for Chemical Process Safety (CCPS)

Some information is available in these databases, but it’s generic — not specific to your unique site or operating context. For example, you can find out how often a certain type of pump fails, but you can’t discover whether that pump is being used on an oil platform, refinery, power station or mine site. Industry data does, however, provide useful estimates on which you can base your calculations and test your assumptions.

Capture all these insights in an easily accessible way, then use what you’ve learned to make the best decision currently possible. And be sure to record the basis for your decision for future reference. If you get better data down the road, you can always go back and revise your decisions — after all, most maintenance strategies should remain dynamic by design.

Don’t let a lack of data paralyze you into inaction. Gather what you can, make a decision, see how it works, and repeat. It’s a process of continuous improvement, which given the right framework is simple and efficient.

Availability Workbench™, Reliability Workbench™, FaultTree+™, and Hazop+™ are trademarks of Isograph Limited the author and owner of products bearing these marks. ARMS Reliability is an authorized distributor of those products, and a trainer in respect of their use.

Reliability Summit_Blog_Web Banner

Author: Jason Ballentine

As with any budget, you’ve only got a certain amount of money to spend on maintenance in the coming year. How do you make better decisions so you can spend that budget wisely and get maximum performance out of your facility? ??????????????????????????????????????????????

It is possible to be strategic about allocating funds if you understand the relative risk and value of different approaches. As a result, you can get more bang for the same bucks.

How can you make better budget decisions?

It can be tempting to just “go with your gut” on these things. However, by taking a systematic approach to budget allocation, you’ll make smarter decisions — and more importantly you’ll have concrete rationales for why you made those decisions —  which can be improved over time. Work to identify the specific pieces of equipment (or types of equipment) that are most critical to your business, then compare the costs and risks of letting that equipment run to failure against the costs and risks of performing proactive maintenance on that equipment. Let’s take a closer look at how you can do that.

4 steps to maximize your maintenance budget

1.  Assign a criticality level for each piece of equipment. Generally, this is going to result in a list of equipment that would cause the most pain — be it financial, production loss, safety, or environmental pain — in the event of failure. Perform a Pareto analysis for maximum detail. 

2.  For your most critical equipment, calculate the ramifications of a reactive/run-to-failure approach.

  • Quantify the relative risk of failure. (You can use the RCMCost™ module of Isograph’s Availability Workbench™ to better understand the risk of different failure modes.)
  • Quantify the costs of failure. Keep in mind that equipment failures can affect multiple aspects of your business in different ways — not just direct hard costs. In every case, consider all possible negative effects, including potential risks.
    • Maintenance: Staff utilization, spare parts logistics, equipment damage, etc.
    • Production Impact: Downtime, shipment delays, stock depletion or out-of-stock, rejected/reworked product, etc.
    • Environmental Health & Safety (EHS) Impact: Injuries, actual/potential releases to the environment, EPA visits/fines, etc.
    • Business Impact: Lost revenue, brand damage, regulatory issues, etc.

For a more detailed explanation of the various potential costs of failure, consult our eBook, Building a Business Case for Maintenance Strategy Optimization.

3.  Next, calculate the impact of a proactive maintenance approach for this equipment

  • Outline the tasks that would best mitigate existing and potential failure modes
  • Evaluate the cost of performing those tasks, based on the staff time and resources required to complete them.
  • Specify any risks associated with the proactive maintenance tasks. These risks could include the possibility of equipment damage during the maintenance task, induced failures, and/or infant mortality for newly replaced or reinstalled parts.

4. Compare the relative risk costs between these approaches for each maintenance activity. This will show you where to focus your maintenance budget for maximum return.

When is proactive maintenance not the best plan?

For the most part, you’ll want to allocate more of your budget towards proactive maintenance for equipment that has the highest risk and the greatest potential negative impact in the event of failure. Proactive work is more efficient so your team can get more done for the same dollar value. Letting an item run to failure can create an “all hands on deck” scenario under which nothing else gets done, whereas many proactive tasks can be performed quickly and possibly even concurrently.

That said, it’s absolutely true that sometimes run-to-failure is the most appropriate approach for even a critical piece of equipment. For example, a maintenance team might have a scheduled task to replace a component after five years, but the problem is that component doesn’t really age -— the only known failure mode is getting struck by lightning. No matter how old that component is, the risk is the same. Performing replacement maintenance on this type of component might actually cost more than simply letting it run until it fails. (In these cases, a proactive strategy would focus on minimizing the impact of a failure event by adding redundancy or stocking spares.) But you can’t know that without quantifying the probability and cost of failure.

Side note: Performing this analysis can help you see where your maintenance budget could be reduced without a dramatic negative effect on performance or availability. Alternatively, this analysis can help you demonstrate the likely impact of a forced budget reduction. This can be very helpful in the event of budget pressure coming down from above.   

At ARMS Reliability, we help organizations understand how to forecast, justify and prioritize their maintenance budgets for the best possible chances of success. Contact us to learn more.

Availability Workbench™, Reliability Workbench™, FaultTree+™, and Hazop+™ are trademarks of Isograph Limited the author and owner of products bearing these marks. ARMS Reliability is an authorised distributor of those products, and a trainer in respect of their use.

In our previous blog article, “RCA Program Development: The Key Steps of Designing Your Program”, we provided a high-level outline of the eleven key elements that need to be defined in order to have an effective root cause analysis program in your organization. Now, in a series of articles, we’ll break down each of those eleven key elements into further detail expanding on the important considerations that need to be taken under account, starting here with your goals and your current status. bigstock-Group-of-three-successful-busi-68707195.jpg

  1. RCA Goals and Objective Alignment

So, the first question is, “What are we trying to achieve with the RCA effort?” The answer from an overarching perspective can be found in the organization’s goals and objectives. Every organization, and individual for that matter, has a set of goals and objectives that are used as yardsticks to measure both short and long term performance. It is critically important that the RCA effort be in complete alignment with the organization’s and individuals’ goals and objectives. We do this by using the goals and objectives to guide us in identifying the Key Performance Indicators (KPIs) of the RCA effort and setting the Threshold Criteria (Trigger Diagram) for determining when a formal RCA must be performed. (Learn more about setting Threshold Criteria.) If the alignment is true, then there will be tangible, measurable improvement in goals and objectives achievement over time.

  1. Status of Current RCA Effort

Every organization will have some form of RCA in practice whether it is formalized, or ad hoc. It is worthwhile spending some time assessing the status, or maturity level, of the existing RCA process. Maturity level can be categorized in one of four general categories.

  • Level 1: Learning and Development
  • Level 2: Efficient
  • Level 3: Self-Actualizing
  • Level 4: Pro-Active

Level 1, Learning and Development, is where most organizations without a formalized RCA program find themselves.  Management recognizes a need for a formal problem solving method but the focus is primarily on training. There is little or no structure in place to support the trained facilitators and no well-defined KPIs or threshold criteria guidance. At this stage the organization will usually gain some organizational improvements from the elimination of problems, but in an inefficient “learn as you go” manner.

At Level 2, Efficient, formal RCA triggers and KPIs are in place and are aligned with business goals and objectives in advance of RCA training. This would include clear definitions of RCA roles and responsibilities as well as identification of supporting infrastructure such as RCA status tracking, effectiveness of implemented corrective actions and the like.

In the Self-Actualizing level, the effectiveness of the trained problem-solvers has matured through experience. Thus, their ability to solve organizational problems has resulted in a documented achievement of the program KPIs and resulting improvements to the organization’s goals and objectives. The organization is now in the continuous process of tightening the bandwidth of the KPIs to yield greater return to the bottom line. The RCA facilitators are now highly confident, efficient, and effective in eliminating impediments to achieving goals and objectives.

In the Pro-Active level, your organization has now integrated the RCA process into its core culture. Effective problem elimination is the norm and expected at all levels of the organization. People no longer look to place blame for problems but instead are focused on prevention and elimination. Return on investment for both monetary as well as health/safety/environmental issues is extremely high acting both as gratification and motivation. RCA has become a core competency within your culture whereby people are intolerant of ineffectively solving problems the first time and are finding pro-active ways to use RCA to prevent problems from occurring in the first place.

There are existing methods or surveys that can be used to determine an organization’s maturity level. Why is this important? Determining your current maturity level draws a line in the sand showing where you started, or took a renewed focus, in this journey of developing your RCA program. You can set goals around where you want to be over a period of time and look back to see how your program has actually evolved.

This article has given you a glimpse into the first two key elements but of course, there is more to setting up your RCA program for success. ARMS Reliability’s RCA experts can assist you with designing your complete RCA program or reinvigorating your current one. This of course includes assisting with determining the status of your current RCA effort and walking you through the process of establishing and aligning goals. Learn more about our recommended facilitated workshop and contact us for more information.

Stay tuned for the next article in this series.

RCA Progam Development Banner.jpg

bigstock-Construction-Worker-Falling-Of-68401633_Filters.jpgWithout truly understanding the key elements (and possessing the necessary skills) to conduct a thorough, effective investigation, people run the risk of missing key causal factors of an incident while conducting the actual analysis. This could potentially result in not identifying all possible solutions including those that may be more cost effective, easier to implement, or more effective at preventing recurrence.

Here we outline the 5 key steps of an incident investigation which precede the actual analysis.

1. Secure the incident scene

  • Identify and preserve potential evidence
  • Control access to the scene
  • Document the scene using your ‘Incident Response Template’ (Do you have one?)

2. Select investigation team

  • The functions that must be filled are:
  • Incident Investigation Lead
  • Evidence Gatherer
  • Evidence Preservation Coordinator
  • Communications Coordinator
  • Interview Coordinator

Other important considerations for the selection of team members include:

  • Ensure team members have the desirable traits (What are they?)
  • The nature of the incident (How does this impact team selection?)
  • Choose the right people from inside and outside the organization (How do you decide?)
  • Appropriate size of the team (What is the optimum team size?)

*Our Incident Investigator training course examines each of these considerations and more, giving you the knowledge to select investigation team members wisely.

3. Plan the investigation

Upon receiving the initial call:

  • Get the preliminary What, When, Where, and Significance
  • Determine the status of the incident
  • Understand any sensitivities
  • If necessary and appropriate, issue a request to isolate the incident area
  • Escalate notifications as appropriate

The preliminary briefing:

  • Investigation Lead to present a preliminary briefing to the investigating team
  • Prepare a team investigation plan

4. Collect the facts supported by evidence

Tips:

  • Be prepared and ready to lead or participate in an investigation at all times to ensure timeliness and thoroughness.
  • Have your “Go Bag” ready with useful items to help you secure the scene, take photographs, document the details of the scene and collect physical evidence.
  • Collect as much information as possible…analyze later
  • Inspect the incident scene
  • Gather facts and evidence
  • Conduct interviews

*While every step in the Incident Prevention Process is crucial, step 4 requires a particularly distinct set of skills. A lot of time in our Incident Investigator training course is dedicated to learning the techniques and skills required to get this step done right.

5. Establish a timeline

This can be the quickest way to group information from many sources

Tip:

  • Stickers can be used on poster paper to start rearranging information on a timeline. Use different colors for precise data versus imprecise, and list the source of the information on each note.

After steps 1-5 comes the Root Cause Analysis of the incident, solution implementation and tracking, and reporting back to the organization:

6. Determine the root causes of the incident

7. Identify and recommend solutions to prevent recurrence of similar incidents

8. Implement the solutions

9. Track effectiveness of solutions

10. Communicate findings throughout the organization

*Steps 6-10 are taught in detail at our Root Cause Analysis Facilitator training course.

To learn more on the difference between our Incident Investigator versus RCA Facilitator training courses, check out our previous blog article and of course, if you would like to discuss how to implement or improve your organization’s incident prevention process, please contact us.

bigstock--136958450.jpgAuthor: Bruce Ballinger

 

To have a successful implementation and adoption of your new RCA program, it’s crucial to have all the elements of an effective and efficient program clearly identified and agreed upon in advance.

Here’s a high-level look at the elements that will need to be defined:

  1. RCA Goals and Objective Alignment
  • Define the goals and objectives of the program and assure they are in alignment with corporate/facility/department goals and objectives
  • Status of Current RCA Effort
  • Perform a maturity assessment of existing RCA program to be used as a baseline to measure future improvements
  • Key Performance Indicators
  • Identify KPI’s with baselines and future targets to be used for measuring progress towards meeting program goals and objectives
  • Formal RCA Threshold Criteria
  • Determine which incidents will trigger a formal RCA and estimate how many triggered events may occur in the upcoming year
  • RCA and Solution Tracking Systems
  • Identify which internal tracking systems will be used to track status/progress of open RCA’s and implemented solutions
  • Roles and Responsibilities
  • Identify specifically who will have a role in the RCA effort including, program sponsor, champion, RCA facilitators
  • Training Strategy
    • Determine who will be trained in the chosen RCA methodology and to what level and in what time frame
  • RCA Effort Oversight and Management
    • Identify who (or what committees or groups) will be responsible for managing tracking systems, decisions on solution implementation, program modifications over time, and general program performance
  • Process Mapping
    • Process mapping exercise to document RCA management from the beginning of a triggered incident to completion of implemented solutions, including their impact on organization’s goals and objectives.
  • Human Change Management Plan
    • Develop a Change Management plan, including a detailed communication plan, that specifically targets those whose job duties will be affected by the RCA effort.
  •  Implementation Tracking
    • Create a checklist to monitor RCA effort implementation including action items, responsible parties and due dates

We recommend conducting a workshop in order to define each of these crucial elements of your RCA program.

The workshop should be conducted for what we call a “functional unit” which ideally is no larger than a plant or facility, however, it can be modified to accommodate multiple facilities.

Common elements of a functional unit include:

  • A common trigger diagram
  • Common KPI’s
  • The same Program Champion
  • Members have an interdependence and shared responsibility on one another for functional unit performance

By structuring programs to fit within the goals and objectives of the business, or “functional unit”, rather than applying a ‘one size fits all’ solution, effective and long lasting results can be realized.

Implementing a new RCA program or need to reinvigorate your current one? ARMS can help you create a customized plan for its successful adoption. Contact Us for more information

Author: Scott Gloyna

For any given asset there are typically dozens of different predictive or preventive maintenance tasks that could be performed, however selecting the right maintenance tasks that contribute effectively to your overall strategy can be tricky, The benefit is the difference between meeting production targets and the alternative of lost revenue, late night callouts, and added stress from unplanned downtime events. Construction Worker Pointing With Finger. Ready For Sample Text

Step 1: Build out your FMEA (Failure Mode Effects Analysis) for the asset under consideration. 

Make sure you get down to appropriate failure modes in enough detail so that the causes are understood and you can identify the proper maintenance to address each specific failure mode.

Once you’ve made a list of failure modes, then it’s detailed analysis time. If you want to be truly rigorous, perform the following analysis for every potential failure mode. Depending on the criticality of the asset you can simplify by paring down your list to include only the failure modes that are most frequent or result in significant downtime.

Step 2: Identify the consequences of each failure mode on your list.

Failure modes can result in multiple types of negative impact. Typically, these failure effects include production costs, safety risks, and environmental impacts. It is your job to identify the effects of each failure mode and quantify them in a manner that allows them to be reviewed against your business’s goals. Often when I am facilitating a maintenance optimization study people will say things like “There is no effect when that piece of equipment fails.” If that’s the case, why is that equipment there? All failures have effects, they may just be small or hard to quantify, perhaps because of available workarounds or maybe there is a certain amount of time after the failure before an effect is realized.

Step 3: Understand the failure rate for each particular mode.

Gather information on the failure rates from any available industry data and personnel with experience on the asset or a similar asset and installation, as well as any records of past failure events at your facility. This data can be used to evaluate the frequency of failure through a variety of methods — ranging from a simple Mean Time To Failure (MTTF) to a more in-depth review utilizing Weibull distributions.

(Note: The Weibull module of Isograph’s Availability Workbench™ can help you to quickly and easily understand the likelihood of different failure modes occurring.)

Step 4: Make a list of possible reactive, planned or inspection tasks to address each failure mode.

Usually, you start by listing the actions you take when that failure mode occurs (reactive maintenance). Then broaden your list to any potential preventive maintenance and/or inspection tasks that could help prevent the failure mode from happening, or reduce the frequency at which it occurs.

  • Reactive tasks
    • Replacement
    • Repair
  • Preventive tasks
    • Daily routines (clean, adjust, lubricate)
    • Periodic overhauls, refurbishments, etc.
    • Planned replacement
  • Inspection tasks
    • Manual (sight, sound, touch)
    • Condition monitoring (vibration, thermography, ultrasonics, x-ray and gamma ray)

Step 5: Gather details about each potential task.

In order to compare and contrast different tasks, you have to understand the requirements of each:

  • What exactly does the task entail? (basic description)
  • How long would the work take?
  • How long would it take to start the work after shutdown/failure?
  • Who would do the work?
  • What labor costs are involved? (the hourly rates of the employees or outside contractors who would perform the task)
  • Would any spare parts be required? If so, how much would they cost?
  • Would you need to rent any specialized equipment? If so, how much would it cost?
  • Do you have to take the equipment offline? If so, for how long?
  • How often would you need to perform this task (frequency)?

A key consideration for inspection tasks only: What is the P-F interval for this failure mode? This is the window between the time you can detect a potential failure (P) and when it actually fails (F) — similar to calculating how long you can drive your car after the fuel light comes on, before you actually run out of fuel Understanding the P-F interval is key in determining the interval for each inspection task.

The P-F interval can vary from hours to years and is specific to the type of inspection, the specific failure mode and even the operating context of the machinery.

It can be hard to determine the P-F interval precisely but it is very important to ensure that the best approximation is made because of the impact it has on task selection and frequency.

Step 6: Evaluate the lifetime costs of different maintenance approaches.

Once you understand the cost and frequency of different failure modes, as well as the cost and frequency of various maintenance tasks to address them, you can model the overall lifetime costs of various options.

For example, say you have a failure mode with a moderate business impact — enough to affect production, but not nosedive your profits for the quarter. If that failure mode has a mean time between failures (MTBF) of six months, you might take a very aggressive maintenance approach. On the other hand, if that failure mode only happens once every ten years, your approach would be very different. “Run to Failure” is often a completely legitimate choice, but you need to understand and be able to justify that choice.

These calculations can be done manually, in spreadsheets or using specialized modeling software such as the RCMCost™ module of Isographs Availability Workbench™.

Ultimately you try to choose the least expensive maintenance task that provides the best overall business outcome.

 Ready to learn more? Gain the skills needed to develop optimized maintenance strategies through our training course: Introduction to Maintenance Strategy Development

rcm201_web-banner