ARMS Reliability are getting ready for the annual AMPEAK conference this June. ARMS will be presenting and exhibiting at this event.

Michael Moulton, Software Sales & Training

The AMPEAK conference has been organized by the Asset Management Council Australia.  Over 4 days, 300+ asset management and maintenance professionals from Australia and around the globe will receive access to the most up-to-date asset management information across a broad range of industries via presentations, tutorials and workshops.

WA Lead Engineer, Weylon Malek, will be presenting ‘Production Reliability Analysis to Improve Asset Management’, and Apollo Trainer and Facilitator, Jack Jager, will be presenting ‘6 Critical Steps for Facilitating a Successful Root Cause Analysis.

You can see Weylon’s presentation on Tuesday, 3rd June at 1.30pm, and Jack’s presentation on Thursday 5th June at 11.30am.

Don’t forget to stop by the Exhibit area where you can see Michael Moulton at the ARMS booth.

Conference Details:
Perth, WA | 2-5 June | Crown Convention Centre

AMPEAK Conference Website

By Kevin Stewart

A facilitator conducting a Root Cause Analysis using the Apollo method performs a crucial role throughout an investigation. Here are some tips and steps to keep in mind when facilitating:

Over many years, I repeatedly hear that the ‘Apollo Root Cause Analysis methodology is only used for big, serious investigations.’ This statement always makes me smile – because it is completely untrue.

An RCA using the Apollo Root Cause Analysis methodology can be performed on any problem, large or small, as long as the right facilitator is on board. This article, part 1 of 2, explores the strategies and processes a facilitator should keep in mind when an investigation proceeds.

ANYONE CAN FACILITATE

In my Apollo Root Cause Analysis methodology training classes, I always ask whether anyone is a certified facilitator. I’ve only received one ‘yes’ from the 2,000 or so students that have attended my courses. This sole person will have been trained in how to manage a group of different personalities; how to progress a group towards its goal; how to be firm and fair; and so on.

Yes, these are valuable skills to learn. And, in an ideal world, every facilitator would have the time and resources to complete the training. But you can facilitate a Root Cause Analysis using the Apollo Root Cause Analysis methodology without this certification.

Facilitating RCAs requires flexibility – yet it also requires that you follow a standard outline. While every RCA has its own path, it will generally adhere to these main steps:

  1. Gather information
  2. Define the problem
  3. Create a Realitychart
    a. Phase one: Create the draft RealityChart™
    b. Phase two: finish and formalise the RealityChart™
  4. Identify solutions
  5. Finalise the report

The process – as laid out above in its basic format – may look a little daunting to someone who has never facilitated an RCA before. Particularly, if you are contending with other feelings – like being anxious in front of a crowd, or feeling responsible for the outcome. You will need to deal with these latter issues in your own way.

What you can take charge of is finding a way to shape a group of disparate people into a highly functioning team, who share the common goal of reaching a solution. By following the steps below, you can prepare for a smooth facilitation process.

PREPARING FOR A FACILITATION

Step 1. Familiarise yourself with the Apollo Root Cause Analysis methodology.

First, ensure you are familiar with the Apollo Root Cause Analysis methodology – after all, it’s what you’re trying to facilitate. If you need a review, the RealityCharting™ learning centre is a great place to visit to recap on the basics. Here, you can complete a simulated scenario to really fine-tune your understanding of the process.  It would also be a good idea to review the facilitation guidelines in the manual that you received with your original training.  It gives an excellent overview of the entire process.

Step 2. Gather your supplies.

Stock up on post-it notes – and get the good, super-sticky ones that will stay on the wall.

We suggest that you use post-it notes instead of a computer to perform the analysis, as these help to enhance the common reality.  With post-it notes, all participants can see what’s happening.

If you think the analysis will take a few days, get multiple colours of post-it notes so you can easily distinguish between the changes to the chart created on different days or at different times.

Ensure the room you’re working in has plenty of wall space. And, if the walls are unsuitable for post-it notes, tape poster paper to the wall first and then adhere your post-its. Using paper can provide the extra advantage of making the chart easy to remove and take with you.  If it’s sensitive subject matter, you can roll it up and take it with you at the end of the day.

Step 3. Prepare the participants.

Ensure that all participants know what to expect before beginning an RCA. An RCA can require a significant time commitment, so make it clear from the outset how much time is needed from them.

Step 4. Gather information.

The more information you have at the outset, the smoother the journey.

You may already have information at hand in the form of pictures, emails, reports, write-ups, witness statements, and so on.  There may be some useful physical evidence. Request evidence from the right people, collect it and store in the one file.

You may also choose to take the entire team to see the area under investigation, so that everyone has a clear picture in mind about what you’re discussing.

Be aware that, no matter how hard you try, there will always be some missing information.  This is not a problem. You can call someone, look it up at the time, or make an action item for someone to gather the evidence later.

By Kevin Stewart

One definition of Root Cause Analysis  is:
Root Cause Analysis is any structured process used to understand the causes of past events for the purpose of preventing recurrence.

describe the imageThis basic premise is the reason that the RCA is done.

On the surface, it always appears to be a simple matter, however there are always pitfalls and nuances.

One such pitfall that RCA investigators or facilitators face is something I call the “problem is fixed” syndrome. In my work at plants I would run across situations where a problem occurred and a solution was implemented. The particular solution used may or may not have been arrived at by using RCA. In either case the solution is implemented and the “problem is fixed.”

How is this statement validated as being true? Those involved will justify the solution by the simple fact that the problem hasn’t recurred, at least not in the immediate future, which unfortunately is sometimes the focus of plant management due to pressures, career goals or other reasons. On the surface this may seem to be difficult to argue – after all the problem is fixed – or is it?

In the cases I have been involved with, what has really happened is that the MTBF (Mean Time Between Failure) of the problem is actually a long time, say 5 years or greater. I was involved in two investigations where the incident hadn’t happened in the previous 5 years and most likely wouldn’t happen for another 5 years. Investigations had been performed and solutions were offered and implemented.

When asked about the effectiveness of the solutions the evidence given was that the incident hadn’t recurred so the solution must have been effective. On the surface this may appear to be difficult to argue back, since it is true that the problem hasn’t recurred. However by looking at the MTBF of the incident, you can point out that since the MTBF is long the effectiveness of the solution put in place will not be known until the problem recurs at some time in future. So at this particular time no solution, or any other proffered solution would be just as effective since the problem won’t recur anyway. You can easily see where if a facility is not careful they could be “fixing problems” with long MTBF’s claiming success and in reality not have actually provided effective solutions. This argument supports a thorough and complete RCA that is based on the cause and effect principle and are supported by evidence to insure an effective solution is implemented.

In one of the cases above the solution was to do more frequent maintenance to insure the problem was identified. While this would have worked for anything that had a MTBF longer than the frequency chosen it would not have worked for something that had a MTBF less than the frequency chosen. In addition to a solution that would not work in all cases it would have increased the cost of maintenance significantly. In this particular case a little more investigation and adding some additional causes to the chart identified that some external damage had been done and not reported, which caused the issue. If they could fix the unreported damage issue then an effective solution would be found that covered the situation that brought this incident on, it also would most likely fix other incidents that hadn’t even happened yet.

In this case you can see that the offered solution would have appeared to work just fine and since they did “something” everyone feels good about the work and “effective” solution.

The other incident was caused by someone who had recently returned to work after an extended leave. During an operating situation this employee correctly followed the incorrect procedure that was posted at the unit. The solution was to replace the incorrect posted procedure that was found to be incorrect at an operating unit. While replacing the procedure was necessary, they would not know if it is effective for quite a while. Again a little more investigation and a few more causes identified that there was no process to replace modified procedures around the plant. If this was fixed then an effective solution would be identified. You can see that here also the plant management would be thrilled because and investigation was done, something was put in place and the problem hasn’t happened again. I’m sure you can see that this situation very well could happen again either at this unit or other similar pieces of equipment.

Both of these examples also point out that a good RCA must be done using valid principles and evidence for the causes and you must not stop too soon! Stopping too soon is another common mistake in RCA – but that is another tip.

In the meantime be aware of incidents with long MTBF and offered solutions that are not based on good analysis or inappropriate causes.

 

RCA DISCUSSION

What are your thoughts on conducting an RCA facilitation / Investigation and how much time have you spent preparing the analysis and implementing solutions?  Do you have a successful tip worth sharing or discussing? We look forward to reading your feedback and perspective via comments below or let’s connect on our LinkedIn Group – ARMS Reliability – Reliability & RCA for further discussion.

 

ARMS Reliability’s CEO, Mick Drew, recently attended the COO Leaders Resources Summit held on the Gold Coast. The summit included a number of the resource sectors most influential executive management and operators, and was two days of corporate and management level discussions concerning some of the most important issues currently facing the Australian resources sector.

Mick Drew said, “It was great opportunity to engage in frank discussions with some of the industry’s leaders. They told us more about their unique challenges, and what issues are impacting their companies.” Mick continued, “They wanted to understand more how we can use our expertise and experience to make a positive change to their maintenance practices, asset management and bottom line.”

From the various panel discussions, workshops, and one-on-one meetings, it was clear there is high demand for experts in Reliability & Maintenance who are able to offer guidance and clarity around the key challenges and issues facing the mining industry. Read More →

Become part of a vibrant community that will share knowledge, experience and innovation.

Mainstream ConferenceMainstream 2014 kicks off in Perth on Monday, 12th May. It’s an exciting event where Asset Management leaders and teams come together to share knowledge, experience and innovation.

Mainstream is different to other asset management conferences; it’s an interactive experience rather than a sit-and-listen event. With 40+ sessions, workshops, roundtables, panel discussions and live interviews, you’re sure to find a session that will be of interest to you. Read More →

arms_ebook-6-steps-efficienciesThe manufacturing industry is under immense pressure. Globalisation and increased competition, coupled with a more demanding consumer base, force manufacturers to seek new ways to boost the bottom line.

To improve ROI and respond to customer demands for faster supply at a lower cost, many companies are required to run their manufacturing plants 24 hours a day, 7 days a week. They are squeezing every last drop of availability and capacity from their assets. Read More →

By Jack Jager

How often have you looked at corrective actions and thought that they would have little, if any impact in preventing the problem from reoccurring? It wasn’t just once…. and it continues to happen.

The Question is Why?

Ypointing finger 300x199et the answer is not a simple or straight forward one.  Do we believe that the person(s) creating these corrective actions aren’t trying to do their best? No, I don’t think so. I firmly believe that almost all people are trying to do their best. So where does that leave us?

I think that we are caught up in a system where the reactive, quick fixes are the goal, the way of dealing with incidents on a day to day basis. If you were to have a downtime incident and you were  to bring the  power  back on quickly after an outage, or the machine is back in operation after a short space of time, then the reaction from the management group and from all of your peers is typically….”Well done! Great job!”  A pat on the back for those who have performed the job well.  In other words we give respect and accolades to those who can fix it quickly.  Conversely there is often little reward or acknowledgement for hours of diligent work in the pursuit of actions that will resolve the issue once and for all. We reinforce the quick fixes.
Now don’t get me wrong here because the ability to do the quick fix is and always will be a valuable skill, but the real challenge is to understand whether we have prevented the problem form reoccurring?

What happens after the initial fix is put into place? Where do you go to from there? In the completely reactive model, the fire-fighting model,  where breakdown maintenance often takes precedence over planned maintenance (which then sets you up for the next round of failures), there is always a fire that needs tending, so we will typically tend to jump to that fire, to the next problem on the list. “I have dealt with that one, what’s next?”

The Blame Game

From my conversations with people who attend the courses that I present covering the Apollo Root Cause Analysis methodology, something else becomes blatantly clear. We still seem, on many different levels, to be playing the “blame game”.  The question of “who” still seems to be of paramount importance to some, perhaps many people.  The question I would put forward to these people is “Will knowing who did it, stop it from happening again?” Now to my way of thinking by far the most common answer to this question will be “No”(although there are exceptions). So why do we feel that we need to focus on the “who”? If the goal of doing Root Cause Analysis is to prevent recurrence of the problem the challenge lies not so much in who was involved but rather emphasising, or focusing, on what you can do to stop it from happening again. This focus will lead to gathering more factual information which is the essence of understanding the problem first and foremost.

The “who” side of the question is pretty easy to determine, but if that is what we focus on then it is likely to limit thorough questioning,  and leads quickly and easily down a blame path. Sanctions are given or jobs lost, all based on the knowledge of “who” was at fault. But where does this lead? Wouldn’t this lead to a lack of reporting mistakes or faults as there will be unwanted consequences because of the report? Doesn’t it elevate risk as there would now be a culture of hiding or covering up mistakes? When you ask questions, what are likely to get? The truth?

Something else to consider is whether people intend to cause damage, create failures, injure themselves or hurt others? Again the overwhelming answer is still “NO”.  That people are often involved in many incidents, and make mistakes, is seemingly the constant part of the equation. But that is the nature of the beast. People are fallible, they do make mistakes and no matter how hard we try to control this aspect, the “human error” side of causes, it is forever doomed to failure. If we rely on trying to control people then our solutions will have no certainty in their outcome. Going down this path is simply not reliable.

Hierarchy of Control

This is echoed in the concept of the “Hierarchy of control” where corrective actions are placed within the Hierarchy, as being either a form of Elimination, Substitution, Engineering,  Administrative or P.P.E.  controls.

The first three of these are perceived to be very strong controls, or hard controls, with almost guaranteed, reliable, consistent results. They are however more time consuming and typically involve spending money to achieve your desired outcome. Administrative controls or the use of PPE as a form of control are perceived to be soft controls. They are relatively quick to implement and don’t cost too much and yet if you were to ask the question “will they prevent recurrence”, almost universally the response will be “NO”!

They may however satisfy the need to report.  I have “ticked the box” and created a perception of having done something about the incident. To take this a step further these “soft options”, now get signed off by management who are fully cognisant of the “Hierarchy of control”. If we keep taking the soft options however is it any wonder that we are still “fire fighting”. If we don’t fundamentally change or control causes that create the  problem then the problem still has an ability to happen again, regardless of the “who”, the person involved. This could be anyone.

Creating another Procedure

How often have you heard or seen, as a response to a problem …….”create another procedure”? Would you be certain that this will prevent recurrence of the problem? It could be said that you have tried to control the problem. You can certainly show that you have done something. Would it however be defensible in a court of law if someone were to subsequently get hurt? If you expect someone to remember every single procedure, of every single task, of the many tasks that they need to perform in every single day, is this feasible? And we all know it is a soft control! An administrative one. So do the courts.

The Argument about Sanctions…

Who learns the most from the mistakes that are made? Isn’t it the person or the people involved? This was put into perspective for me by another Apollo instructor at a conference in Indianapolis. He said to me “if someone makes a mistake for instance and the cost of that mistake might be say $500,000, and you are so angered by this that you then sack the person who made the mistake (quite possible, even probable)……it is like sending someone on a $500,000 training course and then sacking them the next day”.

Does this make any sense?


RCA DISCUSSION

What have you learned from conducting an RCA? Do you have any successful tips or feedback woth sharing or discussing? We look forward to reading your feedback via comments below or let’s connect on our LinkedIn Group – ARMS Reliability – Apollo Root Cause Analysis for further discussion.

 

By Amir Datoo, Senior Reliability Engineer

The power industry is investing heavily in new technologies to harvest power from renewable sources like wind, solar and hydro. Yet with these new technologies come massive maintenance costs – if strategies are not put in place from the outset.

As we march further into the 21st century, the power sector is undergoing a massive shift. With climate change high on the global agenda, the industry as a whole is committed to finding alternatives to conventional fossil fuel power generation. Low-carbon power sources like wind, solar and hydro are being pursued by even the most traditional power companies. Read More →

By Joel Smeby, Senior Reliability Engineer, and Michael Drew, Managing Director, ARMS Reliability.

It’s a common phrase and one that is thrown around often.  But what does it really mean to have an optimised strategy?  If someone asks if your strategies have been optimised, can you answer with a resounding ‘Yes!’ and explain exactly what that means?

An optimised maintenance strategy means that your equipment is being maintained and operated at the lowest possible cost with respect to labour, spare parts, equipment, and failure effects.   Failure effects may consider cost of downtime, safety and environmental considerations, or operational impact.  In these cases it means that your facility is being maintained and operated in a way that is within your corporate risk thresholds, meets operational goals and has the lowest overall costs. Read More →

Landmark product release puts users of Isograph’s Availability Workbench™ just one click away from saving even more time, energy and cost in their asset management and maintenance programs.

Global Reliability and Asset Management consulting firm, ARMS Reliability has released the Reliability Integration Tool – a powerful new software tool with global application in the resource, utilities, power and transport industries.

The Reliability Integration Tool™ equips reliability engineers and asset managers with the power to seamlessly upload and download data between Isograph’s Availability Workbench and their CMMS system. Read More →