Some successful implementations of Continuous Improvement (“CI”) use the approach known as Kaizen (#1). One of the core principles of Kaizen is self-reflection of processes, which is also known as “Feedback”. The purpose of CIP is the identification, reduction, and elimination of suboptimal processes in other words is to become Efficient. Becoming efficient is achieved through incremental steps or evolutionary change (#2) if you follow Kaizen.

The purpose of this article is to introduce how Availability WorkbenchTM (“AWB”) can be used to achieve each of the three Kaizen aspects of Continuous Improvement namely, Feedback, Efficiency and Evolutionary change. Firstly we begin with Feedback.

Read More →

describe the image

When an incident or accident occurs at your workplace, what do you do to fix the problem?

In many cases, the “5 Whys process” is a proven and accepted means to get to the root cause of the incident. But what do you do if this technique doesn’t dive deep enough – and only presents further symptoms rather than the real cause or, indeed, causes?

Ths eBook reveals the benefits and limitations of the 5 Whys process; and then presents a useful method for taking the analysis further.

Get My Copy

By Ned Callahan

Everybody agrees, don’t they, that the whole point of the investigation of safety incidents, whether injuries have actually been suffered or the potential for them was high is to prevent their recurrence? Regrettably, the tendency to blame is more apparent in these cases than in mechanical failures or supply chain deviations, for example, presumably because of the deeper emotional responses from the affected parties.

Tblog RCA health and safetyhe significance of the particular event can then be intensified because the variety and depth of the participants’ emotional responses are undeniably “real” and can, if not appropriately accommodated in the total incident management process, cloud the judgement of the investigator/s and even complicate the task for the team of analysts assembled for the RCA.  Minimising the risk of friction, avoiding undue “heat” being generated by the harm (nearly) caused, can be achieved by the prompt application of an investigation process which both encourages and relies upon the frank sharing of information in order to achieve the agreed objective.

A mature business will have a risk matrix which pre-determines the level at which the investigation is undertaken and therefore, which “tool” or methodology may be prescribed for the particular event. The previous deliberations about which method to use for what level/type of event will have been influenced by the organisation’s previous analysis history, incorporating the relative success or otherwise of previous investigations. These results will have been generated by multiple factors such as the quality of evidence, determined by the care taken in its collection and preservation, the rigour of the facilitation process, the relative “influence” of stakeholders and significantly, the co-operation of the incident actors, being the victim/s and witness/es.

An event, being the first of its type in the organisation, with a very minor injury and no time lost may only require a “trouble-shooting” type approach. The expectations of regulatory authorities in hazardous industries can be another influence on the choice.
But then all that experience, positive, negative or mixed can be neutralised by the emergence of a different principal with responsibility for the RCA process who has experience of another method or specific training and expertise and has the clout to sway the choice. It may well be simply based on a personal preference arising from familiarity rather than an objective assessment of alternatives.

Regardless of the methodology selected, the purpose must be to prevent recurrence and not to blame. If the investigation focuses primarily on “who” did or did not do something or other, the tenor of the subsequent analysis may become negative and the opportunity to really learn from the experience will be subordinate to the search for a culprit. By the way, this “no blame” attitude does not exempt personnel who are repeatedly and wilfully negligent in the performance of their duties or associated activities in the workplace. The owners have a duty of care to provide a workplace for all and if misbehaviours increase the probability of increased risk of harm they are obliged to respond. Reprimand is a reasonable sanction. Or, in the most severe but rare cases, dismissal might be reasonably justified. The justification would be the thorough, objective analysis. Otherwise the organisation could find itself liable to unfair dismissal or similar charges.

The need for objectivity cannot be over-stated and explains why best practice for significant events is to engage a third party facilitator who has no “skin in the game”. If the broad business context for deep analysis is Continuous Improvement, the enhanced safety of the workplace and all processes and equipment operations used by its employees must be the outcome.

Keeping in mind that every event is unique in some respects – the most obvious being that it happened at a different time to every other one (you know of) – the purpose of the RCA is to discover what is different or distinctive about this event. What are the other unique causes which might be effectively controlled or negated in order to significantly reduce the likelihood of a repetition or similar occurrence?

So, after the exhaustive process has been followed, with the facts associated with the incident having been recorded, the consequences measured and documented, the timeline and sequence of events mapped, any cans of worms expertly opened and explored, you have discovered a number of causes. Typically and ideally, you will have discovered causes of which you were ignorant at the beginning of the analysis. And these will only be discovered if the event is sliced thinly, if every phase is considered very carefully. These ought to be documented in some graphical form so that the team’s understanding of the event can be shared and agreed as complete. The cause and effect chart or tree is the most common display form employed and there needs to be provision for the display of the pertinent evidence for each cause.

It is imperative that all of the causes are revealed before you can be confident that prevention is assured. Being persistent in the quest for causes is a very desirable trait. Don’t stop too soon. Then, the existence of clearly defined relationships between the causes and their effects will provide the clarity necessary to instil confidence that the consequent solutions will be effective. It is the solutions, targeting specific causes, which combine to assure prevention, or at least, serious mitigation of the consequences.

But the job is incomplete. The solutions need to be implemented in a timely fashion to have an effect on the probability of recurrence. If, for example, one of the causes is the failure of some mechanism then identifying a solution for that may also entail deeper investigation to determine other failure modes which could have similar, potentially harmful effects. Note however that the investigation is not per se a solution even though it may provide data which leads one to alternative or complementary solutions.

Establishing the priorities for that implementation, giving ownership and due dates for completion are the closure everybody needs. It will be a learning experience for all intimately concerned but can and should be shared more widely in a large organisation. Nobody disagrees with a safe workplace and that attitude will reflect well on the organisation and community regard may well be heightened.  A safe workplace also reduces the likelihood of interruptions to business and therefore this increased reliability will strengthen relationships with customers and suppliers alike long-term. 

training footer ad resized 600


Michael Drew, ARMS Reliability CEO, has put together 11 steps to help you with your next Process loss review. Read More →

By Jack Jager

Defect Elimination articleWhat is “defect elimination” and a “Defect Elimination program”?

“Defect elimination” analyses the defect, and then implements corrective actions to prevent future similar defects.

A “Defect Elimination program” is a structured process companies adopt to become more consistent and reliable in eliminating defects. It forms part of a broader Quality Improvement program.  It’s a systematic approach to apply defect elimination consistently across the operations of a company, for any opportunities that present themselves as worthy of the effort. Read More →

Philip Sage – CMRP
Principal Reliability Engineer

If your production processes aren’t firing on all cylinders – and costing your business much more than they should – here is a very fast, very focused solution: the Vulnerability Assessment and Analysis (VAA).

Let’s look at a hypothetical situation. You are the new Director of Reliability for a global company, and you’ve inherited a floating oil production rig in the North Sea. When you start working with the platform team, it quickly becomes obvious that a number of issues are hampering the rig’s performance. Some of these issues are known to the team, others aren’t. Read More →

5 critical compnents ebook

Incident Investigation is an improvement process. It’s about continually working on your weaknesses to realize marginal gains – a number of small improvements that result in a better program overall. 


This eBook breaks down the 5 critical components you should consider when establishing your RCA program – or just as important, when striving to improve your RCA program. You’ll also get practical tips and tactics to get the most value out of each element of your program.


Get My Copy

By Jack Jager and Michael Drew

root cause analysisThe RealityChart™ (cause-and-effect chart) that you generate during a Root Cause Analysis investigation is important as it creates a common understanding of why the problem has occurred.

Creating your RealityChart™ starts with finding the causes that contributed, or played a part, in the event or problem that occurred. During this phase of the analysis, the chart serves as the interactive platform where all of the information is captured, recorded, and organized. The chart should be highly visible so that all group members can see and comment on it.

(Tip: If you build your initial chart using “Post-It®” notes, attaching them to a vertical surface is best. Use dark coloured, thick marker pen for writing. This simply makes the information more readable. If you want to move your chart, post the notes on a roll of brown paper which can be rolled up and moved. Using RealityCharting™ allows the chart to be shared electronically)

The second challenge in the creation of the RealityChart™ is to arrange the causes in a meaningful, logical way that other people can follow and understand. The crucial point here is whether other people can understand the chart, not just you. This is the real litmus test for the chart and can be a challenge. Whilst you may believe that your chart is sound, if other people can’t follow it then it might possibly be subjected to scrutiny, be dissected at every turn, and perhaps even be dismissed if believed to be an inaccurate representation of the problem. Be prepared as others view your chart to listen to what they think, you may discover alternative paths or additional causes that you or the team could not see.

So, to ensure your chart is a good representation of the problem analysis, challenge your charts and be open to other views.

How do you do that?
I’m going to tell you about two ways – Testing your logic and applying “rules check”.

1) Test your logic

Remember there are three important things about charts – Logic, Logic, and Logic! If the logic is sound then the connection should be logical in both directions. What I mean is, if A is caused by B and C, then the converse of this must also be true – B and C cause A.

If you use this test and the statement doesn’t ring true then the connection needs to be changed so that it becomes logical.

Here’s an example.

How often have you heard that you have a “failed bearing” and that this is caused by a “lack of lubrication”? Now whilst this may be true, and it does have the semblance of a logical connection, there is much that happens in between these two causes.

How does it sound when you state the connection the opposite way: Whenever you have a lack of lubrication, you will have a failed bearing. Now this just doesn’t sound right. It is not always true. This understanding indicates that there are other causes that have yet to be found.

What happened to the causes of “metal to metal contact”, “generation of heat”, “expansion of metal”, “narrow tolerances”, “bearing in use”, “lack of monitoring”, “no tripping mechanism”, “extreme heat”, severe duty and so on? There is a lot more information here than meets the eye.

A lack of lubrication itself does not cause the bearing to fail – not instantaneously. A lot of things happen before you have catastrophic failure of the bearing. So the initial statement that you have a “failed bearing” being caused by a “lack of lubrication” is far too simplistic. It is a generalisation that requires a lot of assumptions to be made.

Your job is to present the facts in a logical arrangement rather than allowing or forcing people to make guesses based on insufficient information. The adding of more specific details (even what some people consider to be superfluous detail) can be very beneficial in facilitating this. It is the detail that allows comprehensive understanding of your chart.

2) Apply the “Rules Check”

When using the Apollo Root Cause Analysis methodology, your RealityChart™ must have:

Evidence to support each of the causes.
This validates the information which gives the chart credibility.

Stop points indicated and a reason for stopping also provided.
This indicates to everyone that you have stopped asking questions on that causal path and have provided a valid reason for doing so. When all cause paths have been completed in this manner, then the chart is finished.

Causes should be labelled as either actions or conditions.
This helps you to see what type of causes you have found and therefore what may have been missed. It drives the questioning process to another level.

Each connection should have a least one action and also one condition.
Though typically we see more conditions than actions, we should never see a straight line of causes within a chart. This too should generate the asking of more questions.

Any anomalies or violations to these “rules” should demand that another question be asked. The anomaly, or violation, must be challenged.

It is the challenge that is important. Challenging the cause and effect charts consistently will improve the quality of the charts. It is about dotting the “I”s and crossing the “T”s. That said, there is no such thing as a correct chart – they are always a work in progress. They are rarely if ever “perfect”.

The initial chart should be considered a draft and is a direct reflection of the information you have available and the amount of time that you have to organize and challenge it. As the chart continues to develop, challenge it constantly using the logic test and the rules check.

Significantly, a quality chart will enable you to demonstrate the effect that your corrective actions will have on the problem or event. If you eliminate or control a cause that forms part of a causal relationship, then whatever happens after that point is effectively prevented from occurring and you can demonstrate this very effectively by referring to a detailed, logical chart.

Added benefits:

  • Once a quality chart has been produced for a systemic, recurring failure, that chart could be used as a template and rolled out when similar failures occur. Then, it’s a matter of challenging the chart to see if the information is all correct.

    How much time would this save your organization in investigations? How much time would it save your organization to solve systemic issues that are eliminated?

  • A “quality” chart can be a learning tool. It can be shared amongst colleagues as a resource that shows what to look for when similar problems arise.


A RealityChart™ is a dynamic view of the logical cause and effect relationships that represents the logic as to why a problem has occurred. They can be shared, challenged and changed over time. They lead to effective solutions for one off and systemic problems.

Demand excellence in your charts. The effort in trying to achieve this will be time well spent.

ARMS Reliability are currently engaged to provide the Asset Management guidance for a Maximo upgrade with a major water utility in Melbourne, Australia. There are many elements to the process that is worth considering if your own organisation is undergoing the same type of project.

The first step was to create the KPI’s and calculations that the maintenance department will be measured against. This is important to ensure that these goals align with the overall organisations objectives. It also dictates the minimum fields that need to be designed into the new CMMS system if they are not available with the out of box solution. Read More →

By Antonie Jacobs, Senior Reliability Engineer, ARMS Reliability

A Practical guide to getting a “ready for implementation” Maintenance Strategy in Capital Equipment Projects.

Same old story • • •

Maintenance strategyThis is my third plant expansion in 10 years. Next week we start with staged commissioning, but there is so much still to do. My Maintenance Planner and Team Leaders are breaking down my door, asking for resources to develop their maintenance strategies and populating our CMMS. We have not even yet finished the previous expansions’ plans! The design company is demobilizing, and the engineers will be occupied for months with process optimisation. And I don’t have approval for my Reliability team yet! It will take years to get the strategies done now that we’ve reached the end of our capital resources!” Read More →