Monthly Archives: August 2017

You are browsing the site archives by month.

bigstock-Leadership-74184760The previous article in our blog series described the recommended training strategy for your RCA program development. The next step in achieving a successful RCA program is to ensure leadership understands their role and has the tools in place to ensure the longevity of the program and its effectiveness.

To ensure the success of your root cause analysis program leadership must have a vested interest and take responsibility not only for developing and overseeing the functions of the RCA effort, but also monitoring the status of the individual analyses and associated solutions. This monitoring is typically done by the Steering Committee in conjunction with its other strategic responsibilities.

The critical elements to track in relation to conducting the root cause analysis include

  • Incident date
  • RCA assignment date and lead
  • Estimated RCA completion date
  • Days past due
  • Escalation activity
  • Actual completion date

The critical elements to track in relation to the solutions that are to be implemented include

  • RCA completion date
  • Solution assignment date and lead
  • Estimated solution completion date
  • Days past due
  • Escalation activity
  • Actual completion date
  • Frequency of incident recurrence
  • Annual savings/HSE incident reduction

steering committeeOnce a root cause analysis has been completed, a list of potential solutions will be developed by the RCA team and submitted to the Steering Committee via the Program Champion or his/her designee for approval. The Steering Committee then assigns these solutions to individuals for completion and puts them into an action plan format with assigned due dates. These actions should be completed in the shortest time possible, otherwise the process will quickly fade away. The Steering Committee must track the status of open RCAs, the progress of implementing the solutions to ensure timely completion, and the effectiveness of previously implemented solutions (as measured by recurrences of the original incidents). New analyses should not be started if a large number of solutions remain to be implemented.

An appropriate person needs to be assigned the responsibility of tracking progress and recurrence.  The right person for this responsibility may be different for different organizations. Progress is tracked by showing the number or percentage of completed solutions. Recurrence will be tracked by measuring repetition of the incident.

Some organization will already have software and methods for tracking tasks, such as a CMMS. If this is the case, it can be considered for RCA and solution tracking as well. However, if a system does not currently exist or does not fulfill all the organization’s RCA tracking needs, then we would recommend considering RCProTM enterprise RCA software. It allows for the generation of an action list, due dates, and comments of each analysis to be shared with team members. It will also provide detailed reports on current investigation status, action tracking, outstanding Items, and view systemic issues across the organization.

This is where the Steering Committee review and support really comes in to play. The leadership team should review RCA status and solution implementation and final results as a regular part of Steering Committee business. The Steering Committee’s main role is to ensure that RCAs are completed in a timely fashion and that resulting solutions are implemented and tracked for effectiveness.

So far, this blog series has covered:

The Key Steps of Designing Your Program

Defining Goals and Current Status

Setting KPIs and Establishing Trigger Thresholds

RCA and Solution Tracking and Roles and Responsibilities

Recommended RCA Team Structure

Responsibilities of the Six Roles

RCA Program Development Training Strategy

And, Oversight and Management.

Stay tuned for our next installment on RCA Process Mapping.

Author: Jason Ballentine, VP of Engineering for ARMS Reliability

High Season Means Higher Stakes

bigstock--134880761This summer, the heat is shattering records around the United States—in Arizona, 119°F (48°C) days mean dozens of plane flights have been grounded and air conditioners are demanding an unprecedented number of megawatts from utilities. With average temperatures rising every summer and energy demand following suit, utilities have recognized the need to be more proactive about reducing their risk of outages.

Recently, the Chief Operating Officer of one energy generation company sought our help to fend off any issues that could result in a summer outage. Not only would an outage mean unhappy customers, but it would also mean financial losses if the utility couldn’t run at maximum capacity during its most lucrative season.

Throughout the winter, this utility saw a few small issues here and there. While nothing too dramatic happened, the COO recognized that he wouldn’t be able to afford something bigger going wrong during the busy season. He approached us to conduct a Vulnerability Assessment and Analysis (VAA) that would help identify his company’s most critical issues and reduce the likelihood of a service interruption.

*A VAA can be conducted on any type of operation in any industry. Learn more

Shedding Light on Potential Vulnerabilities

The analysis began with one power plant. This utility was like many other operations—they had several vulnerabilities on their radar in some form but no central repository for tracking them all. There might be a machine operator who knew about one issue, there might be an email chain about another issue, a few deferred work orders hanging around, but no way of making all issues known to all parties.

We began collecting information about the plant’s vulnerabilities—conducting individual interviews and brainstorming sessions with small groups of engineering and operating staff. We also reviewed event logs and work order histories to determine whether past events were likely to reoccur. We wanted to know: what issues had they been living with for a while? Where were they deferring maintenance? What spare parts were they missing? What workarounds were in place? Over the course of about a week, we reviewed all the vulnerabilities that could slow down or stop production on 40,000 pieces of equipment.

Concentrating on the Critical

Blank checklist on whiteboard with businessman hand drawing a reOut of this process, about 200 vulnerabilities were identified. Next, we scored each vulnerability in terms of likelihood and consequence and then ranked them “low,” “medium,” or “high” according to the corporate risk matrix. While there were about 25 vulnerabilities that we identified as being in the “high” category, we determined that 16 of them comprised approximately 80 percent of the risk to production.

If the utility focused on resolving these 16 issues first, they would see the greatest results in the shortest amount of time. We were also able to show the utility which type of vulnerability was most prevalent (wear and tear) and which systems were most in need of attention.

The final step was to assign a high-level action to each of the most critical vulnerabilities (examples might be “order spare parts” or “seek approval for design change from fire marshal”). Now the utility had a clear plan for which vulnerabilities to address first, where to begin resolving each vulnerability, who was responsible for each action item, and a recommended time frame for taking action.

Conclusion

Like most organizations, this utility wasn’t surprised by the vulnerabilities we identified. Chances are, these issues had been looming in the background making everyone somewhat uneasy due to the lack of clear prioritization or path to resolution.

Over the course of just three weeks, our Vulnerability Assessment and Analysis captured all the potential vulnerabilities, prioritized them according to criticality, and provided a clear path of action. By following this plan, the utility could dramatically reduce the chances of a catastrophic slow down or stoppage, eliminating much of the stress that usually accompanies the high season.

The utility’s COO was so pleased with the results at the first plant that he immediately scheduled a Vulnerability Assessment and Analysis for the next power plant, with plans to eventually cover them all.

It’s important to conduct a Vulnerability Assessment and Analysis before a period of high production, but it’s also a useful process in advance of a scheduled work stoppage. This way any fixes that are identified can be completed without incurring additional downtime.

Find out more about our Vulnerability Assessment and Analysis process.

Author: Jason Ballentine, VP of Engineering for ARMS Reliability

Starting From Scratch With Spares

Portrait of warehouse worker talking to supervising manager whilAs anyone with a hand in running a household knows, it’s important to keep a stockpile of key items. You certainly don’t want to find out the hard way that you’re on your last square of toilet paper. But in the case of a facility like a power plant, a missing spare part could be more than just a nuisance—it could be downright expensive.

Determining the appropriate spare parts to have on hand in a large facility, however, can be tricky. This is especially true after building a facility from the ground up, when you don’t have a frame of reference for which spare parts you’re most likely to need first.

Most organizations deal with this in one of two ways: 1) they guess or 2) they purchase according to a spares list provided by an equipment vendor.

A Reliability-Focused Purchase List

There are obvious limitations when it comes to guesswork—making the wrong guess can result in huge expenses either in unnecessary spare parts or in costly downtime. A vendor-suggested list is probably somewhat more accurate, but such suggestions are unlikely to take into account the specific needs of your organization. We approach spare part holding recommendations through the lens of reliability as it applies to each specific operation. As factors change, it’s important to re-evaluate, making sure to take into account everything that could influence purchase priorities.

Recently, a utility company approached us to review the list of spare parts their equipment vendor had recommended. According to the vendor, this utility needed to purchase $4.9 million worth of spare parts up front. The utility wanted a second opinion before making such a sizable investment.

Our Approach

We started the spare parts analysis by looking at the list provided by the equipment vendor, but then we dug much deeper. We explored a series of questions, including: How often is this part likely to fail? What is the cost of the downtime if the part is attached to a critical piece of equipment? What is the unit cost of the spare part? What is the lead time to obtain a spare? Is this part likely to fail at any time throughout its lifecycle, or is it only likely to fail at the end of its life? There is no point in purchasing a spare today if you are unlikely to need it for another 20 years.

In all, about 1,500 pieces of equipment were reviewed over 40 days before providing a recommended list of spares. The final list included some of what the vendor had recommended, left off many of the vendor’s recommended parts, and suggested a few additional parts that weren’t in the original list.

The final critical spares list that was recommended included a total of $2.2 million in spare parts—a savings of $2.7 million over what the vendor had originally recommended.

Built to Adapt

Saving money with ARMSOur recommended spares list is intended to be responsive to changing needs and new information. When the utility took a second look at its downtime cost and calculated that it was actually $10/megawatt and not the $23/megawatt they had initially determined, we re-evaluated the spares list, reducing the utility’s recommended purchases by another $200,000.

Conclusion

If your organization is like most, you probably run into trouble when it comes to having the right spares on hand. Either you’re missing the right parts when something breaks down, or you have expensive spares gathering dust and potentially going bad in storage. ARMS Reliability takes the guesswork out of developing a critical spares list, taking into account item costs, the likelihood of failure, lead times, downtime costs, and all other relevant factors.

The investment this utility made to conduct the analysis with our help ultimately reduced their bottom line equipment costs by $2.7 million—which represented a savings of 50 to 1. Beyond the monetary benefit, the utility’s Reliability Engineer felt much more confident in the approach taken. He was also relieved to avoid grossly overspending on spares.

Find out more about ARMS Reliability’s Spare Part Holding Analysis