An often-overlooked element of implementing any new initiative is the process mapping exercise. The more intricate the initiative, the more valuable the process map becomes. Although launching a root cause analysis implementation plan is usually fairly straight-forward, it is still worthwhile spending some time mapping out how the work flows from a triggered event all the way through tracking the effectiveness of implemented solutions. It’s an effective way of ensuring that everyone has a clear understanding of their role and where it fits into the process.

Process mapping is usually done in two-steps commonly known as the brown paper/white paper exercise. The brown paper mapping step creates a diagram representing how RCAs are currently managed throughout the system. Once the existing workflows are clearly understood and charted the white paper charting is performed. This diagram documents the desired workflow and systematically identifies gaps between the current and desired future state. Identifying the nature and magnitude of these gaps allows the Facility Leadership Team and/or RCA Steering Committee to dedicate the resources needed to make the change over.  It also clearly defines the roles and responsibilities of affected departments and positions in the RCA program.

As seen in the example below, different symbols and colors are used to represent various steps and components. The steps are laid out from left to right and top to bottom. A start or stop point is usually designated with an oval or rounded rectangle, a regular step is a rectangle and a decision point is a diamond. All steps are connected by lines and arrows.

Example RCA Process Map- Click to download enlarged PDF

RCA Workflow

When done properly, process mapping leaves no room for misunderstanding of what needs to be done when or voids in roles/responsibilities, resulting in efficient and effective implementation of the RCA initiative and ultimately, operational improvements.

So far, this blog series has covered:

The Key Steps of Designing Your Program

Defining Goals and Current Status

Setting KPIs and Establishing Trigger Thresholds

RCA and Solution Tracking and Roles and Responsibilities

Recommended RCA Team Structure

Responsibilities of the Six Roles

Training Strategy

Oversight and Management

And, Process Mapping.

Stay tuned for our next installments on Change Management and Implementation Tracking.

Reliability Summit View Available WorkshopsCorporate and site reliability teams face challenges and pressures to continuously improve and demonstrate the value and business impact they have on their organizations.  The old adage of RCM being a “Resource Consuming Monster” has plagued many a Reliability Department – some organizations have even banned the use of the acronym.

Instead, RCM needs to be viewed as an engineering framework that enables the definition of a complete maintenance regime for maintenance task optimization.

Both public and private sector organizations around the world rely on reliability centered maintenance as a means to significantly increase asset performance by delivering value to all stakeholders. Successful implementation of RCM will lead to increase in cost effectiveness, reliability, machine uptime, and a greater understanding of the level of risk that the organization is managing. It can also deliver safer operations, provide a document base for planned maintenance, and predict resource requirements, spares usage, and maintenance budget.

 So, how do you equip your organization for best-practice reliability centered maintenance?

An RCM study determines the optimal maintenance strategy for assets determines by modelling different scenarios and comparing risks and improvements over this lifetime to enable better long-term management of the assets.

At a high level, an RCM Study involves:

Step 1: Developing an FMEA using collected failure data from a variety of sources, such as work order history, spares usage rates, interviews with personnel responsible for maintaining the equipment

Step 2: Combining data with OEM maintenance manuals and spares catalog information to develop a preliminary RCM model

Step 3: Making changes to the preliminary model during facilitation with the staff

Step 4: Validating and optimizing the RCM model by assessing each failure mode by the cost, safety,  environmental and operational contributions to reduce cost and risk

Step 5: Building an maintenance plan that can be uploaded into your CMMS for direct integration of RCM with CMMS

This process will reveal any gaps in the existing maintenance strategy, or conversely deliver peace of mind that existing strategies are working.

Join us at the Reliability Summit, May 8-11, in Austin, Texas to learn how to manage reliability centered maintenance for your organization.

Attendees will learn: 

  • RCM Skill Building
  • Why Traditional Maintenance cannot meet the needs of business today
  • Weibull Data Analysis
  • What is RCM and how does RCMCost deliver this methodology plus more
  • How to identify failure modes that can impact your plant
  • How to calculate failure data relevant to your equipment using Weibull feature in RCMCost
  • Assessing the total cost impact of failure on a business
  • How Preventive Maintenance and Predictive Maintenance improve business, safety, environment and operational risks
  • Simulating the maintenance strategy in RCMCost
  • RCM Skill Building Continued
  • How to select the optimum maintenance task and frequency
  • Exercises in RCMCost
  • Maintenance Decision making elements and sensitivities

This is one of many workshops attendees can select to attend at the Reliability Summit. For a full list of workshops, please visit our Reliability Summit 2018 website.  

blog_footer

 

 

Reliability Summit View Available WorkshopsYour company is going through an asset management initiative and they need ‘reliability engineers’ to support this new focus.  One day your title begins with ‘Maintenance _____’ and the next day you come into the office and the title on your door now reads ‘Reliability _____’.  Undertaking new asset management initiatives as a newly titled “reliability engineer” can be daunting.

Reliability Engineering isn’t typically something one would go to school for or get a certificate in, so what does an R.E need to know?

Your “toolkit” as an R.E. should consists of various methods that you can employ with the goal of optimizing maintenance strategies to achieve operational success, including:

  • root cause analysis
  • reliability centered maintenance
  • failure modes and effects analysis
  • failure data analysis
  • reliability block diagrams
  • lifecycle cost calculation

To be successful at increasing the reliability of your plant, reliability practitioners should utilize these ‘tools’ that can deliver the best results, applying them based on the type of problem you’re facing.

Approaching Maintenance Strategy Optimization with Your Toolkit

It’s essential for a newly appointed reliability professional to be aware of common maintenance issues. The more time maintenance personnel spend fighting fires, the more their morale, productivity, and budget erodes. The less effective routine work that is performed, the more equipment uptime and business profitability suffer.

Here’s the good news: An optimized maintenance strategy is simpler and easier to sustain than a non-optimized strategy, resulting in fewer issues and downtime. It’s easy for organizations and new reliability engineers to be intimidated by the idea of maintenance strategy optimization. An important tip to remember is that small changes can make a huge difference. Maintenance optimization doesn’t have to be time-consuming or difficult, nor does it have to be a huge undertaking. By creating a framework for continuous improvement and understanding the methods to employ, you can ultimately drive towards higher reliability, availability and more efficient use of your production equipment.

Join us at the Reliability Summit, May 8-11, in Austin, Texas to learn the essential tools in a Reliability Engineer’s toolkit and how to apply them to achieve operational success.

Attendees will learn: 

  • History of Reliability
  • Introduction to Reliability Concepts
  • Benefits of a Reliability Based Maintenance System
  • Performance Measures
  • Definitions of Terms and Measures in Reliability
  • Introduction to Reliability Engineering Methods
  • Failure Mode and Effects Analysis
  • Failure Data Analysis
  • Reliability Centered Maintenance
  • Maintenance Optimization
  • System Availability Analysis
  • Lifecycle Cost Calculation
  • Problem Reporting
  • What Tool When
  • Key Factors for Success
  • Key Steps in a Reliability Program
  • Summarizing the Business Case for Reliability

This is one of many workshops attendees can select to attend at the Reliability Summit. For a full list of workshops, please visit our Reliability Summit 2018 website.  

blog_footer

 

 

 

 

 

 

Reliability Summit View Available WorkshopsWorld-class maintenance performance requires strong maintenance strategy. But all too often, Leadership within the organization isn’t fulling on board with undertaking an optimization project because they don’t yet see the full value in doing so. And, perhaps, you’re not sure exactly how to convince them that the initiative is worthwhile. 

So, how do you raise awareness within the organization and get support for what you need to do? 

You must build a business case that can overcome the primary objections, illuminate the need and demonstrate the real, tangible value that your project will provide to the organization. 

If you are thinking your organization should invest in doing a maintenance review and optimization, then you’re probably experiencing some of the signs:  

  • High production downtime 
  • Maintenance staff in fire-fighting mode 
  • Some spare parts collecting dust, yet key spares are not available when needed 
  • Maintenance instructions consist of little more than a title or some generic text, e.g., “check and lube as necessary”
  • Very little, if any, information captured on maintenance work orders 
  • Scheduled maintenance tasks generally only created after equipment has failed 
  • Costly equipment failures creating budget overruns 
  • Higher risk of catastrophic failure, equipment damage and major events due to potential (or actual) equipment failures
  • Maintenance KPIs are not in place or are trending towards lower performance
  • Maintenance group isn’t highly valued by the rest of the organization 

To get the buy-in you need, you need to consider points of resistance you might encounter from Maintenance, Production, and Site Management teams. Show them how the signs listed above are causing real problems for your organization, build your business case backed by data, and demonstrate how your initiatives will benefit each stakeholder group.  

Join us at the Reliability Summit, May 8-11, in Austin, Texas to learn in-depth how to build a compelling business case to gain support for your reliability initiatives.  

Attendees will learn: 

  • Potential resistance, fear of change and how the two impact reliability initiatives  
  • Benefits to stakeholders at all levels and how to sell them
  • Necessary steps to build the business case  
  • What analyses and data are required to assess the current state of maintenance and how to use your CMMS to assist   
  • How to explain how each problem affects the business from a maintenance, production, EHS, and business impact  
  • How to develop a project proposal and the key items to be included  
  • What tools to use and how to quantify the additional cost savings to be realized   
  • How to conduct a pilot project and benefits of doing so  

This is one of many workshops attendees can select to attend at the Reliability Summit. For a full list of workshops, please visit our Reliability Summit 2018 website.  

blog_footer

 

 Reliability Summit View Available WorkshopsHow do you know if your plant is designed to deliver the target level of productivity?

For many organizations, their answer is “we don’t know.” New capital projects are  typically designed with the main goals of achieving the lowest possible capital outlay for plant and equipment while maintaining the plant’s ability to meet productivity targets. Too often; however, minimizing costs ultimately garners most of the focus in the design phase and as a result plants are handed over to operations teams that simply aren’t designed for reliability. Then the consequences start to appear – High number of failures and breakdowns, no way to achieve better performance from equipment short of a redesign, maintenance costs too high for the plant to be sustainable for the long term.  

To combat this, world-leading organizations are starting to require that a RAMS Analysis (Reliability, Availability, Maintenance, and Safety Analysis) be completed at each project stage. These studies serve as checkpoints with scenario modeling that provides various options to the project team as to how they can meet the business goals of the project at the lowest possible cost. Sophisticated organizations are also incorporating peer reviews to challenge the plant designs and Lifecycle Cost Analysis to evaluate the project over a longer period to predict costs, so they can plan and budget accordingly.  

Case in point… 

Here’s a timeline of how a global mining company built reliability into their design throughout various project stages

  • 2008 – Developed Reliability Block Diagram (RBD) model to validate the design capacity and allow for potential bottlenecks to be understood. Identified that there was a baghouse in the design that could not be isolated and required a complete plant shutdown to perform any maintenance.  Also predicted maintenance budget and labor requirements to understand the maintenance intensive items in the design. Identified multi-million dollar per day single point failure that was addressed in revised re-design that allowed maintenance on baghouse to be completed without plant shutdown.  
  • 2010 – Revised RBD to accommodate some design changes and to validate that the capacity could still be achieved.   
  • 2014 – Revised RBD to accommodate further design changes as project team was challenged to reduce capital cost and increase construction speed. RAM/RBD proved capacity could still be met. Team was challenged to reduce equipment capital by $30M yet keep capacity. Marginal capacity increase resulted and $30M reduction.  
  • 2016 – Revised RBD as more detailed information became available and as further changes were made. Headquarters rubber-stamped project to proceed.  
  • 2017 – Estimate for maintenance build from EPC was 120,000+ man hours. Using ARMS’ libraries, past company models/FMEA’s, and an equipment class strategy approach, we estimate it can take around half of that time and investment to produce maintenance strategies that will help ensure the predicted availability is realized. 

All through the process the mining giant found that the RBD was an essential tool for them when undergoing peer reviews at each project gate.  It was used to confidently assure the board that the capacity targets could be met and that they had a solid foundation on which the budget and resource forecasts were made. 

Join us at the Reliability Summit, May 8-11, in Austin, Texas to learn in-depth best-practices for designing for reliability. This workshop will cover the benefits of designing for reliability and what that process should look like to ensure a sustainable, successful plant is handed over to Operations.  

Attendees will learn: 

  • How to conduct scenario modelling of the plant design and configuration to ensure the plant meets its availability and production requirements at the lowest cost 
  • How to prevent hidden failures and bottlenecks caused by poor plant design 
  • How to develop budget predictions around availability, capacity, labor needs, spares needs, and maintenance costs 
  • How to build maintenance strategies for projects that help ensure the predicted availability is realized 

This is one of many workshops attendees can select to attend at the Reliability Summit. For a full list of workshops, please visit our Reliability Summit 2018 website.

 blog_footer

Essential Energy Q&A

Essential Energy operates one of Australia’s largest electricity networks, delivering essential network services to more than 800,000 homes and businesses across 95 percent of New South Wales and parts of southern Queensland. The Government-owned corporation is committed to continuously improving safety performance, operating at best industry practice, and minimising network charges to customers.

In line with these objectives, Essential Energy is always on the look-out for ways to improve business efficiency and productivity, and Copperleaf C55 has provided an answer. With Copperleaf C55 and ARMS Reliability, the corporation is enhancing its ability to optimise capital projects, manage risk, and meet regulatory and customer-driven performance targets.

Here, we talk to Adam about the implementation.

Why were you looking for a solution like Copperleaf C55?

The energy industry in Australia is undergoing momentous change with an increasing need to improve efficiency.

Given the scale of Essential Energy’s geographical footprint, it is imperative that we manage our assets as effectively and efficiently as possible. We sought a decision analytics solution to enable our investment planning and decision-making processes and support delivery of long-term strategic business performance goals. Copperleaf C55 enables us to do this.

Why did you choose Copperleaf C55?

Copperleaf C55 lets us identify the optimal set of investments and timing that deliver the greatest value to the organization and its customers across multiple portfolios. It helps us proactively identify, quantify and manage risk.

The solution – delivered by ARMS Reliability – supports our business as it is designed specifically for the complexity found in asset management. It is particularly useful because it can handle the high volume of assets we manage. Other stand-out features include its ability to provide a long-term strategic view of investment needs; its ability to optimise investment portfolios to multiple constraints; and to model multiple alternatives for each investment and the risks in our asset base.

How was the implementation?

The implementation was well executed and we were readily able to incorporate organizational specific requirements by leveraging off the implementation team’s experience. We asked for an atypical roll-out, and they were very flexible in accommodating our need to run multiple implementation stages in parallel.

Is there anything unique Essential Energy is doing with C55?

We are continually discovering the many ways which C55 can accept data to help us in our decision-making.

The first unique application was using CSIRO geographical shape files to generate additional asset data points. Our assets are geographically diverse and these points, such as corrosion regions, termite regions, sulphite soil levels, bushfire zones, wind regions etc, allow us to better understand the behavior of our assets. In many cases, it is where the asset is located rather than the asset type that may define the likelihood and consequence of failure, which determines the appropriate spend levels to achieve sustainable risk management.

The second unique application was to develop flexible models that can be deployed at the individual or rolled up asset levels. The benefit of rolled up assets is that it significantly simplifies investment creation, while individual asset analysis allows us to dive into the detail where required.

What benefits have arisen since the solution was implemented?

It has provided us with a robust methodology for quantifying investment value. We get consistent valuation over a large geographical region. We can articulate the risks on our assets in more detail and clearly to stakeholders.

How easy has it been to train staff and get them familiar with the system?

The system is easy to use and powerful. We have more than 80 staff required to use the system at varying levels of detail and that number will only increase. As with any new system, there is a period of training and acceptance but the initial engagement from the staff is great.

The use of workflow supports how we want to operate and facilitates increased ownership in the process.

For more information about this Asset Investment Planning and Management solution download a Brief Introduction to Copperleaf C55

Author: Dane Boers, Senior Reliability Engineer

The risk matrix has served its purpose but falls well short of the data-driven business requirements of today. Enter the Value Framework.

Background

For more than a decade, the risk matrix has been the go-to decision-making tool for assessing risk, and for good reason. The risk matrix is practical, easy to use and flexible enough to apply to various risk types and situations, including:

  • Assessing risks of a particular asset
  • Deciding which investments or projects have the highest importance
  • Choosing which risk controls to implement
Risk-Matrix-1024x550

Figure 1 Example of a Risk Matrix

The purpose of the risk matrix is to simplify the assessment process while still providing meaningful results. Technology and data processing tools now allow for complex assessments using simple interfaces – this plays a major role in supporting the increasing need for improved risk based decision making.

Shortfalls of the risk matrix

Granularity and/or resolution

Risk is not discrete but continuous, and many risks can be similar. Thus, the first shortfall of the risk matrix is that similar risks cannot be separated even though there are known differences. This reduced granularity can result in sub-optimal decisions and missed opportunities for improvement, because subtle differences in likelihood or consequence will likely result in the same ‘risk box’ selection.

If you were to prioritize two similar risks with all else being equal, it would make sense to address the slightly higher risk before the lower risk ­– even though the assigned risk level is the same according to the matrix.

Consider a reduction in the likelihood of an event by 50%, from once in 50 years to once in 100 years i.e. doubling the life of an asset. This is a huge improvement and may mitigate a significant amount of risk, especially if this is then applied at scale. According to some risk matrixes, the likelihood before and after would be ‘rare’, showing no improvement in risk exposure.

Businesses with large volumes of risk data need to be able to resolve very similar risks to make the best decisions possible, especially when constrained on expenditure or resources. This is even more evident when dealing with large fleets of similar assets and risks.

Multiple risks transparency

The next shortfall of the typical risk matrix is the ability to handle and interpret events that cause multiple similar consequences.

Consider an equipment failure that causes a large amount of smoke in a building. The result may be that 50 people require medical treatment for smoke inhalation. If a single medical treatment injury is assessed as a ‘moderate’ safety consequence, at what point does the sum of these injuries constitute the equivalent of a ‘critical’ or ‘catastrophic’ consequence? E.g. 10, 50, 100 persons?

Without the ability to summate or determine a total risk for an event, low impact but high-volume consequences could leave your organization exposed.

Cost benefit

Assessing risk based purely on outcome risk levels is only one-half of the equation for making effective decisions. The usual risk matrix methodology prioritizes the highest risk levels first, with little regard for the cost to achieve the mitigation. Because organizations have limited resources, determining the best way to utilize these resources is key to remaining competitive in the marketplace. The missing component is cost (monetary or otherwise), and without a cost component, we are unable to answer the following question.

If I can mitigate one of two ‘moderate’ safety risks with the same likelihood, which one should I mitigate?

Once you identify that mitigation of the first risk costs 50% less in dollars, time and resources, the decision becomes clearer. The answer to this critical question is missing from most risk matrixes and risk frameworks.

Cost Benefit

Alignment

To make effective investment decisions around risk mitigation and exposure, an organization must be able to compare and trade-off the value from different risk types (e.g. stakeholder risk vs. environmental risk). In a budgetary or resource constrained environment, this is especially important. An organization must understand which consequences are more important relative to others. A risk matrix partially does this by grouping the consequences into ‘negligible’ or ‘moderate’ groupings, however, this does not answer the question of:

If I can spend $1000 and mitigate either a ‘moderate’ stakeholder risk or a ‘moderate’ environmental risk with the same likelihood, which one should I do?

The matrix type framework is not flexible enough for most organizations to achieve exact alignment of risk types.

X by Y grid and descriptions

When thinking about consequences, the risk ‘levels’ must be meaningful to be constantly applied. This is why safety risks are often thought about in terms of ‘first aid’, ‘medical treatment’, ‘disabling’ or ‘fatal’ injuries. These can be measured and conceptually linked to an event as the most likely outcome. The ‘negligible’ and ‘moderate’ descriptions aren’t meaningful enough.

In the safety risk example above, there are four consequence levels. What if an environmental risk type is introduced into the matrix, and it only has three consequence levels (e.g. ‘<100L spill’, ‘100L-500L spill’ and ‘>500L spill’)? Because the number of meaningful levels can be different between risk types, they cannot fit into an X by Y matrix without distortion.

The solution: ‘The Value Framework’

Identify what is important to your organization (value measures)

The first step in creating a value framework is to identify the things that your organization values or considers to be important. An existing risk management framework or risk matrix is a good place to start. Risk types (e.g. safety, environment, stakeholder, legal and compliance etc.) are common values that can be measured and are found in most value frameworks.

Benefits such as financial returns, increases in employee efficiency and so on are also important and should be included. Another common inclusion in a value framework is strategic targets, KPIs or other measures. Everything identified in this step is known as a ‘value measure’.

Identify the common levels and calculations

Each ‘value measure’ obviously needs to be measured! The next step is to determine the discrete levels for each measure (e.g. for safety, they could be ‘first aid’, ‘medical treatment’, ‘disabling’ or ‘fatal’ injuries). Then add calculations for KPIs or values like ‘employee efficiency’ where an exact value can be obtained. For example:

Employee efficiency = Number of employees affected x hourly rate x hours saved per employee

Alignment

Once the value measures and their calculations have been identified, they need to be aligned to a common scale. This is to allow a non-biased tradeoff between any of the measures in the framework. Typically, this common scale is dollars or a dollar equivalent unit. Every level and calculation of every value measure needs to be quantified. For most risk types, this is calculated as the direct cost or benefit to the organisation.

For example, the cost to the organisation for a safety medical treatment injury (MTI) would be:

$10,000 penalty cost + $1,000 legal cost + $1,500 compensation cost = TOTAL $12,500

Application

Now that we have a rational and consistent way to assign a value to every risk, benefit, cost and other measure that an organisation values, the value framework can be used to assess every investment the same way.

Figure 2 What a Value Framework could look like

Figure 2 What a Value Framework could look like

Summary

The risk matrix is a great tool for rapid risk qualification, but it cannot be used effectively to make risk and value based decisions. More information is required.

Organisations today need to:

  • Differentiate large volumes of risk, and risks with extremely small likelihoods
  • Evaluate and totalize multiple risks
  • Incorporate costs into risk-based decision-making processes
  • Trade off one risk type for another achieving a better overall economic outcome
  • Have a framework that accomplishes all the above with consistent application and transparency

Creating a value framework meets these requirements and allows organizations to make effective value based and risk informed decisions.

To learn more about creating a value framework download the executive whitepaper ‘Value Based Decision Making’

 

bigstock-Leadership-74184760The previous article in our blog series described the recommended training strategy for your RCA program development. The next step in achieving a successful RCA program is to ensure leadership understands their role and has the tools in place to ensure the longevity of the program and its effectiveness.

To ensure the success of your root cause analysis program leadership must have a vested interest and take responsibility not only for developing and overseeing the functions of the RCA effort, but also monitoring the status of the individual analyses and associated solutions. This monitoring is typically done by the Steering Committee in conjunction with its other strategic responsibilities.

The critical elements to track in relation to conducting the root cause analysis include

  • Incident date
  • RCA assignment date and lead
  • Estimated RCA completion date
  • Days past due
  • Escalation activity
  • Actual completion date

The critical elements to track in relation to the solutions that are to be implemented include

  • RCA completion date
  • Solution assignment date and lead
  • Estimated solution completion date
  • Days past due
  • Escalation activity
  • Actual completion date
  • Frequency of incident recurrence
  • Annual savings/HSE incident reduction

steering committeeOnce a root cause analysis has been completed, a list of potential solutions will be developed by the RCA team and submitted to the Steering Committee via the Program Champion or his/her designee for approval. The Steering Committee then assigns these solutions to individuals for completion and puts them into an action plan format with assigned due dates. These actions should be completed in the shortest time possible, otherwise the process will quickly fade away. The Steering Committee must track the status of open RCAs, the progress of implementing the solutions to ensure timely completion, and the effectiveness of previously implemented solutions (as measured by recurrences of the original incidents). New analyses should not be started if a large number of solutions remain to be implemented.

An appropriate person needs to be assigned the responsibility of tracking progress and recurrence.  The right person for this responsibility may be different for different organizations. Progress is tracked by showing the number or percentage of completed solutions. Recurrence will be tracked by measuring repetition of the incident.

Some organization will already have software and methods for tracking tasks, such as a CMMS. If this is the case, it can be considered for RCA and solution tracking as well. However, if a system does not currently exist or does not fulfill all the organization’s RCA tracking needs, then we would recommend considering RCProTM enterprise RCA software. It allows for the generation of an action list, due dates, and comments of each analysis to be shared with team members. It will also provide detailed reports on current investigation status, action tracking, outstanding Items, and view systemic issues across the organization.

This is where the Steering Committee review and support really comes in to play. The leadership team should review RCA status and solution implementation and final results as a regular part of Steering Committee business. The Steering Committee’s main role is to ensure that RCAs are completed in a timely fashion and that resulting solutions are implemented and tracked for effectiveness.

So far, this blog series has covered:

The Key Steps of Designing Your Program

Defining Goals and Current Status

Setting KPIs and Establishing Trigger Thresholds

RCA and Solution Tracking and Roles and Responsibilities

Recommended RCA Team Structure

Responsibilities of the Six Roles

RCA Program Development Training Strategy

And, Oversight and Management.

Stay tuned for our next installment on RCA Process Mapping.

Author: Jason Ballentine, VP of Engineering for ARMS Reliability

High Season Means Higher Stakes

bigstock--134880761This summer, the heat is shattering records around the United States—in Arizona, 119°F (48°C) days mean dozens of plane flights have been grounded and air conditioners are demanding an unprecedented number of megawatts from utilities. With average temperatures rising every summer and energy demand following suit, utilities have recognized the need to be more proactive about reducing their risk of outages.

Recently, the Chief Operating Officer of one energy generation company sought our help to fend off any issues that could result in a summer outage. Not only would an outage mean unhappy customers, but it would also mean financial losses if the utility couldn’t run at maximum capacity during its most lucrative season.

Throughout the winter, this utility saw a few small issues here and there. While nothing too dramatic happened, the COO recognized that he wouldn’t be able to afford something bigger going wrong during the busy season. He approached us to conduct a Vulnerability Assessment and Analysis (VAA) that would help identify his company’s most critical issues and reduce the likelihood of a service interruption.

*A VAA can be conducted on any type of operation in any industry. Learn more

Shedding Light on Potential Vulnerabilities

The analysis began with one power plant. This utility was like many other operations—they had several vulnerabilities on their radar in some form but no central repository for tracking them all. There might be a machine operator who knew about one issue, there might be an email chain about another issue, a few deferred work orders hanging around, but no way of making all issues known to all parties.

We began collecting information about the plant’s vulnerabilities—conducting individual interviews and brainstorming sessions with small groups of engineering and operating staff. We also reviewed event logs and work order histories to determine whether past events were likely to reoccur. We wanted to know: what issues had they been living with for a while? Where were they deferring maintenance? What spare parts were they missing? What workarounds were in place? Over the course of about a week, we reviewed all the vulnerabilities that could slow down or stop production on 40,000 pieces of equipment.

Concentrating on the Critical

Blank checklist on whiteboard with businessman hand drawing a reOut of this process, about 200 vulnerabilities were identified. Next, we scored each vulnerability in terms of likelihood and consequence and then ranked them “low,” “medium,” or “high” according to the corporate risk matrix. While there were about 25 vulnerabilities that we identified as being in the “high” category, we determined that 16 of them comprised approximately 80 percent of the risk to production.

If the utility focused on resolving these 16 issues first, they would see the greatest results in the shortest amount of time. We were also able to show the utility which type of vulnerability was most prevalent (wear and tear) and which systems were most in need of attention.

The final step was to assign a high-level action to each of the most critical vulnerabilities (examples might be “order spare parts” or “seek approval for design change from fire marshal”). Now the utility had a clear plan for which vulnerabilities to address first, where to begin resolving each vulnerability, who was responsible for each action item, and a recommended time frame for taking action.

Conclusion

Like most organizations, this utility wasn’t surprised by the vulnerabilities we identified. Chances are, these issues had been looming in the background making everyone somewhat uneasy due to the lack of clear prioritization or path to resolution.

Over the course of just three weeks, our Vulnerability Assessment and Analysis captured all the potential vulnerabilities, prioritized them according to criticality, and provided a clear path of action. By following this plan, the utility could dramatically reduce the chances of a catastrophic slow down or stoppage, eliminating much of the stress that usually accompanies the high season.

The utility’s COO was so pleased with the results at the first plant that he immediately scheduled a Vulnerability Assessment and Analysis for the next power plant, with plans to eventually cover them all.

It’s important to conduct a Vulnerability Assessment and Analysis before a period of high production, but it’s also a useful process in advance of a scheduled work stoppage. This way any fixes that are identified can be completed without incurring additional downtime.

Find out more about our Vulnerability Assessment and Analysis process.

Author: Jason Ballentine, VP of Engineering for ARMS Reliability

Starting From Scratch With Spares

Portrait of warehouse worker talking to supervising manager whilAs anyone with a hand in running a household knows, it’s important to keep a stockpile of key items. You certainly don’t want to find out the hard way that you’re on your last square of toilet paper. But in the case of a facility like a power plant, a missing spare part could be more than just a nuisance—it could be downright expensive.

Determining the appropriate spare parts to have on hand in a large facility, however, can be tricky. This is especially true after building a facility from the ground up, when you don’t have a frame of reference for which spare parts you’re most likely to need first.

Most organizations deal with this in one of two ways: 1) they guess or 2) they purchase according to a spares list provided by an equipment vendor.

A Reliability-Focused Purchase List

There are obvious limitations when it comes to guesswork—making the wrong guess can result in huge expenses either in unnecessary spare parts or in costly downtime. A vendor-suggested list is probably somewhat more accurate, but such suggestions are unlikely to take into account the specific needs of your organization. We approach spare part holding recommendations through the lens of reliability as it applies to each specific operation. As factors change, it’s important to re-evaluate, making sure to take into account everything that could influence purchase priorities.

Recently, a utility company approached us to review the list of spare parts their equipment vendor had recommended. According to the vendor, this utility needed to purchase $4.9 million worth of spare parts up front. The utility wanted a second opinion before making such a sizable investment.

Our Approach

We started the spare parts analysis by looking at the list provided by the equipment vendor, but then we dug much deeper. We explored a series of questions, including: How often is this part likely to fail? What is the cost of the downtime if the part is attached to a critical piece of equipment? What is the unit cost of the spare part? What is the lead time to obtain a spare? Is this part likely to fail at any time throughout its lifecycle, or is it only likely to fail at the end of its life? There is no point in purchasing a spare today if you are unlikely to need it for another 20 years.

In all, about 1,500 pieces of equipment were reviewed over 40 days before providing a recommended list of spares. The final list included some of what the vendor had recommended, left off many of the vendor’s recommended parts, and suggested a few additional parts that weren’t in the original list.

The final critical spares list that was recommended included a total of $2.2 million in spare parts—a savings of $2.7 million over what the vendor had originally recommended.

Built to Adapt

Saving money with ARMSOur recommended spares list is intended to be responsive to changing needs and new information. When the utility took a second look at its downtime cost and calculated that it was actually $10/megawatt and not the $23/megawatt they had initially determined, we re-evaluated the spares list, reducing the utility’s recommended purchases by another $200,000.

Conclusion

If your organization is like most, you probably run into trouble when it comes to having the right spares on hand. Either you’re missing the right parts when something breaks down, or you have expensive spares gathering dust and potentially going bad in storage. ARMS Reliability takes the guesswork out of developing a critical spares list, taking into account item costs, the likelihood of failure, lead times, downtime costs, and all other relevant factors.

The investment this utility made to conduct the analysis with our help ultimately reduced their bottom line equipment costs by $2.7 million—which represented a savings of 50 to 1. Beyond the monetary benefit, the utility’s Reliability Engineer felt much more confident in the approach taken. He was also relieved to avoid grossly overspending on spares.

Find out more about ARMS Reliability’s Spare Part Holding Analysis