By: Gary Tyne CMRP, CRL

Engineering Manager – ARMS Reliability Europe

Working for a global organization has taken me to some weird and wonderful places around the world. Different cultures, traditions, religions and people certainly enlightens you to the wonderful and colorful place we all call home.

I would say in most of these countries I have at some stage taken a taxi or at least been chauffeured by a driver in a customer’s company vehicle. These experiences have led to some interesting conversations on life, travel, politics, and football with some very knowledgeable and diverse taxi drivers. On the other hand, I have had drivers that have not spoken a word and have just delivered me to my destination in silence, even after trying to engage in conversation, their chosen dialogue is nil speak. bigstock--131191391

A recent taxi encounter occurred when I had just left my customer and was going to call for a taxi, when I spotted someone being dropped off at my current location. I asked the driver if he could take me to Dublin airport and he obliged.

This is when I met Mohammed, an immigrant from Kenya who had moved to Ireland 17 years ago. He was smiling and cheerful and had a generally happy persona about him. We discussed weather in Ireland versus Mombasa, we mentioned football briefly, and then we started to discuss cars. This occurred when a brand new Mercedes went past us in the fast lane and I passed comment on what a beautiful car that was.

Mohammed started to discuss the Toyota Corolla in which we were driving and how he loved his car for its level of reliability. I asked how many miles his vehicle had driven and he pointed out that he had covered over 300,000 miles since he purchased the car brand new in Northern Ireland. He went onto explain how he ensured that it was regularly maintained to a high standard with the best quality oil and original OEM parts being used when any replacements were required. The engine and gearbox were original and providing ‘you look after your car, it will look after you.’ Mohammed was proud of the length of service he had achieved from his vehicle and that the car had never let him down. However, as the vehicle operator he recognized the importance of regular maintenance and the use of the right quality parts. He also said that he only allowed one mechanic to work on his vehicle because he was very skilled and competent at his job and could not trust others to do work on his taxi.

Mohammed was also proud to be a taxi driver in Ireland and combined with his ‘Reliability’ story certainly made the trip to Dublin airport a memorable one. Mohammed did not know my job role and that I had spent over 30 years in Maintenance and Reliability, but he gave me a text book account of what is ‘Reliability’! I said goodbye to Mohammed after he let me take a picture of his mileage and car. I wished him luck and many more years of happy motoring in his reliable Toyota motor vehicle.

Sitting in the departure lounge my trip to the airport and conversation with Mohammed certainly made me think: mileage

  • Do we see this level of passion and ownership amongst today’s industrial operators?
  • Should Operators take more care for their assets, ensuring high reliability through a program of basic care?
  • How do we ensure the right levels of competence in our technicians?
  • How do we ensure that the correct specification and quality of parts are being purchased?
  • How do we ensure that maintenance is being performed at the right frequency on the right asset?

This ‘Reliability Tale from the Taxi’ may have also generated further questions in your own mind, for me, it provided me with  another great ‘Reliability’ story that I can share during one of our global reliability training courses.

 

As its name suggests, an “asset” is a useful or valuable thing. Indeed, the antonym of “asset” is “liability”. Hence, an organization’s assets should deliver value; not cost money. With the right techniques and strategies in place, asset managers can ensure that their plant and equipment is performing at and being maintained at optimum levels. These many and varied techniques can be applied across the different phases of an asset’s life to ensure that,  instead of draining money from the bottom line, it actively contributes to margin increases. F

Managed the right way, assets can contribute significantly to profit margins. It takes a strategic approach to maintenance and asset management, in key areas such as:

  1. Increasing availability and plant capacity
  2. Reducing unnecessary maintenance costs
  3. Reducing unnecessary spares holding costs
  4. Planning optimum retirement of plant and equipment

Once you determine a key focus area, it’s important to apply the right technique.

Margin Increase Techniques

System Analysis

The primary objective of System Analysis is to identify and eliminate bottlenecks in a system, and is particularly useful in complex operations where the contribution of different parts of the system are not clear. An analyst performing System Analysis builds a representative model using reliability block diagrams, and runs a simulation to produce a quantitative view of the contribution of all parts of a system. The technique is used to assess the reliability of individual components and their dependencies on other events or assets in order to assess the overall availability of the system. This helps to determine the importance of each element, so that the analyst can play “what if” with different levels of redundancy, size of buffers, maintenance strategies, and spares holding levels, in order to find the optimum.

Maintenance Benefit Analysis

Unfortunately, there has been a long tradition of organizations fostering a culture of maintenance in which the maintenance crews are lauded as heroes when they step in to fix things that are broken. In such cultures, preventative maintenance is less appreciated, despite it being proven to save money. Maintenance Benefit Analysis – similar to Maintenance Optimization– is used to evaluate a maintenance plan and identify any areas where maintenance is either not needed or is not optimal. A Maintenance Benefit Analysis is used to identify where alternatives to current practice can be improved by choosing a different type of strategy or frequency.

Spares Optimization

Typically, maintenance crews love spares and want lots of them in their plant or facility. Yet plant managers resent having too many spares in stock as they tie up capital and take up storage space. Spares Optimization is all about finding the optimum level of spares to hold; a level that balances the cost of not having spares available against the cost of holding the spares in stock.

Repair vs Replace Analysis

Knowing when to replace a piece of equipment shouldn’t be guesswork, as the right time to replace can save hundreds of thousands of dollars in repairs. Repair vs Replace Analysis is used to predict or track the costs of repairs against the cost of replacement. As the cost of repairs increases (which incorporates costs like labor and parts), it becomes less viable to maintain the asset. Plus, as the cost of new equipment falls, it becomes more viable to buy it new. Life Cycle Cost analysis can be applied to assess the optimum point to switch from repair-mode to replace-mode.

ARMS Reliability can show you how to achieve great cost savings and margin increases across the whole organization by using these techniques and their associated software tools; and will train your team to implement and manage these changes proactively.

In most cases, there is much to gain by working through maintenance strategy optimization. To identify where your company’s maintenance strategy sits on the spectrum, you can perform a simple self-assessment that looks for the most common symptoms, which are described in detail in our guide “5 Symptoms Your Maintenance Strategy Needs Optimizing.” If the symptoms are evident, then there is a strong business case to invest in maintenance strategy optimization. The primary question in diagnosing the health of your maintenance strategy is a simple one. Does your maintenance strategy need optimizing? Ideally, your maintenance strategy is already optimized. Perhaps it was, but is in need of a tune-up. Or, as is the case in many companies, maybe you are experiencing endemic symptoms that lead to: M

  • Recurring problems with equipment.
  • Budget blow-outs from costly fixes to broken equipment.
  • Unplanned downtime that has a flow-on effect on production.
  • Using equipment that is not performing at 100 percent.
  • Risk of safety and environmental incidents.
  • Risk of catastrophic failure and major events.

To identify where your company’s maintenance strategy sits on the spectrum, you can perform a simple self-assessment that looks for the most common symptoms.

  1. Increase in unplanned maintenance – A sure sign that your maintenance strategy is not working is the simple fact that you are performing more unplanned maintenance, which is caused by an increase in the occurrence of breakdowns.
  2.  Rising maintenance costs – In companies that apply best practice maintenance strategy optimization, total maintenance costs are flat or slightly decreasing month-on-month. These optimized strategies combine preventative tasks with various inspection and root cause elimination tasks which in turn produces the lowest cost solution.
  3. Excessive variation in output – A simple definition of the reliability of any process is that it does the same thing every day. In other words, equipment should run at nameplate capacity day in and day out. When it doesn’t, this is an indication that some portion of the maintenance strategy is misaligned and not fully effective.
  4. Strategy sticks to OEM recommendation -Sticking to the maintenance schedule prescribed by Original Equipment Manufacturers (OEMs) may seem like a good starting point for new equipment. But it’s only that a starting point. There are many reasons why you should create your own optimized maintenance strategy soon after implementation.
  5. An inconsistent approach – Consistency implies lack of deviation. And this implies standardisation. When it comes to maintenance strategies, standardization is essential.

For an in depth look at these symptoms download the complete guide “5 Symptoms Your Maintenance Strategy Needs Optimizing” 

If your maintenance activities have a large proportion of reactive repairs then the costs of maintaining your assets are larger than they need to be, because the cost of performing unplanned maintenance is typically three times the cost of performing maintenance in a planned manner. Furthermore, if your system is reactive, it is a sign that you are not managing failures. Your biggest costs may be catastrophic failure, systemic failure or equipment defects.Proactive x Reactive creative sign with clouds as the background

These major meltdowns or one-off events can cost millions of dollars in reactive repairs, lost production and/or major safety/environmental impacts. If you need to lower the cost of maintenance this is an area you can make a significant impact on the P&L.

Proactive maintenance – which is aimed at avoiding such scenarios – is a much more cost-effective approach.

First, what is reactive maintenance? Put simply, it is any maintenance or repair done to a piece of equipment after a failure event. If a gear-box grinds to a halt and your maintenance team rushes to repair it, they are engaging in reactive maintenance.

While the immediate cost of such maintenance may seem low – a day of labor and the purchase of a new part for the machine – the flow-on costs associated with downtime, lost production can be much higher and there is a greater risk of safety and environmental incidents during the shutting down or starting up of equipment.

In companies where reactive maintenance is a large proportion of work performed, there are many hidden costs carried by the business such as higher inventories; premium rates for purchasing spare parts; higher stocking levels for critical spares; more wasted time queuing for tools, materials, and labor; higher overtime levels; more plant downtime; interruption to customer orders; stockouts; offspec quality.  The organization and management system has a short term, busy focus often under budget pressure, variations in production, and lots of “things to do”.

shutterstock_119233999

On the flip side, proactive maintenance takes a preventative approach. It involves making assets work more efficiently and effectively so that downtime and unexpected failures become a thing of the past. It’s also about trimming unnecessary expenditure from asset management budgets. From a bottom line perspective, it’s about boosting the assets’ contribution to earnings before interest and tax (EBIT).

Strategies associated with proactive maintenance involve understanding and managing the likelihood of failures, some of the common analytical methods to understand the impact of failures on the business include:

  • System Analysis – to understand the way equipment failures can impact the availability and production capacity of a system; it allows the analyst to identify and eliminate potential bottlenecks in a system, and thus increase plant capacity
  • Criticality Analysis – to rank equipment by the likelihood and severity of failure impact on key business objectives, so you can then channel maintenance resources into the more critical pieces of equipment
  • Maintenance Benefit Analysis – to evaluate a maintenance plan and identify areas where maintenance is either not needed or not optimal.
  • Spares Optimization – to find the optimum level of spares to hold in-stock, which balances the cost of not having spares available versus taking up storage space on-site
  • Repair Vs Replace Analysis – to predict or track the cost of repairs against the cost of replacement, so it becomes clear when to replace assets for best value
  • Root Cause Analysis – to analyze the root cause of failures and focus resources on eliminating their reoccurrence, not just fixing the symptoms time and time again.
  • Vulnerability Analysis- to systematically review all aspect of the operation in a way to discover tomorrow’s failure, so it can be eliminated in a planned fashion.

As these strategies attest, proactive maintenance is about much more than building a schedule of ongoing maintenance tasks. By understanding and managing failure the maintenance resources can be directed to those areas that require attention in a planned manner, you can actually save significant amounts of money into the long term.

And, above all, it is important to remember that a culture of reactive maintenance is not ideal. In fact, unplanned reactive maintenance is one of the key symptoms that your maintenance strategy isn’t working.

Learn more by downloading our guide: 5 Symptoms Your Maintenance Strategy Needs Optimizing

Certification is the term applied to the process whereby an individual voluntarily submits his/her credentials for review based upon clearly identified competencies, criteria, or standards. The primary purpose of certification is to ensure that the personnel employed meets high standards of performance set out for that role by the certifying authority. TrainingIcon

Certifications demonstrate to employers and/or clients that you are, indeed, an expert in a particular area or areas, and that a reputable, recognizable organization, The Association of Asset Management Professionals is willing to attest to that.

The body of knowledge lays out the required skill and knowledge each Certified Reliability Leader™ (CRL) must possess in order to be certified. The CRL is unique in that it certifies an individual across 29 subjects in 5 inclusive domains and in two key respects, namely their leadership and their reliability expertise. These 29 concepts are embodied in the form a complete body of knowledge with color coded Uptime® Elements™ that span far beyond the typical roles of a maintenance and reliability organization. elements

The breadth of the CRL certification makes it difficult. There are some areas that not everyone has the requisite experience or knowledge, and these represent learning milestones. The first step to becoming an excellent leader is the willingness to learn and lead. It takes knowledge, practice with feedback, and passion for professional development. The CRL is an experienced based journey. It is a journey many of us are familiar with, as the most common functions of reliability are well known and generally understood by most maintenance managers and reliability professionals.

These areas most often associated with this discipline include:

  • Preventative Maintenance (PM)
  • Reliability Engineering
  • Reliability Centered Maintenance
  • Planning and Scheduling Work
  • Computerized Maintenance Management Systems
  • Lubrication, Minor cleaning, and Servicing.

In this regard, the CRL Certification provides an objective measure of the reliability professional’s expertise. An individual who is certified is well qualified, and their particular qualification has been measured voluntarily by their action in sitting for the certification exam.

The predictive technologies could easily serve to form another group which could include:

  • Vibration Analysis
  • Oil Analysis
  • Ultrasonic testing
  • Infrared Thermography
  • Motor Circuit Analysis
  • Alignment and Balancing
  • Non-Destructive Testing

What makes the CRL certification unique is the LEADERSHIP certification component. This is an area we seldom associate with the reliability disciplines or perhaps find these qualities more often listed as qualifiers for the maintenance management job function.

The leadership certification and body of knowledge containing broad leadership skills is what makes the CRL different, and is what sets the CRL apart from other technical certifications. The CRL is not only a technical certification process, it is also a formidable leadership certification process. This is very important in today’s global economy, as leaders are necessary at every level.

Imagine the value a well-qualified individual could bring to your organization if they were knowledgeable and capable of operating in these circles:

  • Human Capital Management
  • Integrity
  • Competency Based Learning
  • Executive Sponsorship
  • Operational Excellence
  • Operator Driven Reliability
  • Defect Elimination

Now imagine they also know a thing or two about reliability!

The leadership elements arguably would not be the first qualifications most HR managers would associate with a maintenance and reliability role, but it is these leadership qualities that deliver value back into an organization. Those who are familiar with the CRL have noted that the benefits extend beyond just the individual who was certified, and are passed on to the team members they interact with, and the organization as a whole.

Working on leadership makes sense at every level of an organization. Certifying your leaders sets you apart.

ARMS Reliability provides Reliability Leadership training as well as the Certified Reliability Leader™ (CRL) exam.

LEARN MORE about upcoming training and certification opportunities.

Can you quantify the financial impact of your maintenance program on your business? Do you take into account not only the direct costs of maintaining equipment, such as labour and parts, but also the costs of not maintaining equipment effectively, such as unplanned downtime, equipment failures and production losses?

The total financial impact of maintenance can be difficult to measure, yet it is a very valuable task to undertake. It is the first step in finding ways to improve profit and loss. In other words, it is the first step towards an optimised maintenance strategy.

In a 2001 study of maintenance costs for six open pit mines in Chile [1], maintenance costs were found to average 44% of mining costs. It’s a significant figure, and it highlights the direct relationship between maintenance and the financial performance of mines. More recently, a 2013 Industry Mining Intelligence and Benchmarking study [2] reported that mining equipment productivity has decreased 18% since 2007; and it fell 5% in 2013 alone. Besides payload, operating time was a key factor.  

So how do you know if you are spending too much or too little on maintenance? Certainly, Industry Benchmarks provide a guide. In manufacturing best practice, benchmarks are less than 10% of the total manufacturing costs, or less than 3% of asset replacement value [3].

While these benchmarks may be useful, a more effective way to answer the question is to look at the symptoms of over- or under-spending in maintenance. After all, benchmarks cannot take into account your unique history and circumstance.

Symptoms of under-spending on maintenance include:

  • Rising ‘hidden failure costs’ due to lost production
  • Safety or environmental risks and events
  • Equipment damage
  • Reputation damage
  • Waiting time for spares
  • Higher spares logistics cost
  • Lower labour utilisation
  • Delays to product shipments
  • Stockpile depletion or stock outs

Other symptoms are explored in more detail in our guide: 5 Symptoms Your Maintenance Strategy Needs Optimizing.

Man in front of computer screen

Figure 1

In most cases, it is these ‘hidden failure costs’ that have the most impact on your bottom line. These costs can be many times higher than the direct cost of maintenance – causing significant and unanticipated business disruption. As such, it is very important to find ways to measure the effects of not spending enough on maintaining equipment.

Various tools and software exist to help simulate the scenarios that can play out when equipment is damaged, fails or, conversely, is proactively maintained. A Failure Modes Effects and Criticality Analysis (FMECA) is a proven methodology for evaluating all the likely failure modes for a piece of equipment, along with the consequences of those failure modes.

Extending the FMECA to Reliability Centred Maintenance (RCM) provides guidance on the optimum choice of maintenance task. Combining RCM with a simulation engine allows rapid feedback on the worth of maintenance and the financial impact of not performing maintenance.

Armed with the information gathered in these analyses, you will gain a clear picture of the optimum costs of maintenance for particular equipment – and can use the data to test different ways to reduce costs. It may be that there are redundant maintenance plans that can be removed; or a maintenance schedule that can become more efficient and effective; or opportunity costs associated with a particular turnaround frequency and duration. Perhaps it is more beneficial to replace equipment rather than continue to maintain it.

It’s all about optimising plant performance for peak production; while minimising the risk of failure for key pieces of equipment. Get it right, and overall business costs will fall.

Want to read on? Download our guide: 5 Symptoms Your Maintenance Strategy Needs Optimizing.

 

[1] Knights, P.F. and Oyanander, P (2005, Jun) “Best-in-class maintenance benchmarks in Chilean open pit mines”, The CIM Bulletin, p 93

[2] PwC (2013, Dec) “PwC’s Mining Intelligence and Benchmarking, Service Overview”, www.pwc.com.au

[3] http://www.maintenancebenchmarking.com/best_practice_maintenance.htm

Figure 1:  This image shows Isograph’s RCMCostTM software module which is part of their Availability WorkbenchTM. Availability Workbench, Reliability Workbench, FaultTree+, Hazop+ and NAP are registered trademarks of Isograph Software. ARMS Reliability are authorized distributors, trainers and implementors.

While there are three main reasons organizations typically perform Root Cause Analysis (RCA) following an issue with their asset or equipment, there are a whole host of other indicators that RCA should be performed.Cartoon_Man/HardHat

Odds are, you’re recording a lot of valuable information about the performance of your equipment – information that could reveal opportunities to perform an RCA, find causes, and implement solutions that will solve recurring problems and improve operations. But are you using your recorded information to this extent?

First, let’s quickly talk about three reasons why RCA is typically performed:

1. Because you have to

There may be a regulatory requirement to demonstrate that you are doing something about a problem that’s occurred.

2. You have breached a trigger point

Your own company has identified the triggers for significant incidents that warrant root cause analysis.

3. Because you want to

An opportunity has presented itself to make changes for the better. Or perhaps you’ve decided you simply don’t want to lose so much money all the time.

At the core of all industry is the desire to make money. Anything that negatively impacts this goal is usually attacked by performing root cause analysis.Oil And Gas Pipelines

I was having a conversation with a reliability engineer at an oil and gas site, and I asked him what lost opportunity or downtime might cost that company over the course of a year. He said it was in the vicinity of three quarters of a billion dollars – $750,000,000. Is this a good enough reason to perform root cause analysis? Even a 10% change would have a huge impact on bottom line figures.

The monetary impact to the business was of course not due to any single event, but to a multitude of events both large and small.

Each event presents itself as an opportunity to learn and to make any changes necessary to prevent its reoccurrence. Once can be written off as happenstance… things happen, serious or minor, and that’s life. But to let it happen continuously means that something is seriously wrong.

While these are all valid reasons to perform an RCA, there are at least ten more tell-tale equipment-related clues that an RCA needs to happen – most of which can be identified through the information you’re probably already recording.

Here are ten tell-tale signs that your organisation needs to perform Root Cause Analysis:

  1. Increased downtime to plant, equipment or process.
  2. Increase in recurring failures.
  3. Increase in overtime due to unplanned failures.
  4. Increase in the number of trigger events.
  5. Less availability of equipment.
  6. High level of reactive maintenance.
  7. Lack of time… simply can’t do everything that needs doing.
  8. Increase in the number of serious events… nearing the top of the pyramid.
  9. Longer planned “shut” durations.
  10. More frequent “shut” requirement.

These indicators imply that we need to be doing more in the realm of root cause analysis before these issues snowball.

If you can identify with some of these pain points, download our eBook “11 Problems With Your RCA Process and How to Fix Them” in which we provide best practice advice on using RCA to help eliminate some of these problems.

11Problems_Ebook_banner_576x247_72ppi

Author: Ben Rowland

A colleague and I were discussing how his nine year old son had completed his Cub Scouts Cyclist Activity badge. We noticed how some of the bike maintenance tasks that had been identified were, shall we say, less than ‘optimal’.

Now you might say this is a bit unfair to judge a Cub Scout lesson through the eyes of a reliability professional (and you’d be right) but what was interesting is that we often see the same sorts of issues within the industry.

Click image to view larger

bike1

 

The first thing we noticed is the tasks aren’t really tasks, but a list of components; i.e. they tell you what to look at but not what to look for.

In other words, how a task is written is clearly very important.  In the example above “check the back tire” does not help us know what to look for. Is it there? Is it worn? Does it have air in it? Is it damaged? With vague work instructions like these maintainers are left to decide what to inspect for, which will inevitably lead to inconsistent maintenance.

Some of the examples above are better than others, “your helmet fits” for example, is more specific and much better than “check helmet.”

While working with clients to develop their maintenance plans, the RCM process we use ensures that each maintenance task addresses a specific failure mode, or modes. We can run a report that shows this link, which in turn allows the maintainer to understand the purpose of the inspection. The task can also be written in such a way as to focus the maintenance on identifying the potential failure.

Another issue with the tasks above is there isn’t any data or figures included in the task.  How much tire wear is acceptable? What is the minimum tread depth?  What pressure should the tire be at? Is there a minimum and maximum?

There also needs to be instruction as to how frequently to do the bicycle checks.  Every ride? Every month?  Things like checking your wheels are fitted tightly might need to be performed prior to every ride, but checking a chain for wear could be performed every few months. Not having this information can lead to items being under or over maintained, leading to possibly unsafe equipment condition or wasted effort.

“Okay then, you do it!”

Well it’s only fair after criticizing the Cub Scout’s effort that we have a go ourselves. So below is an example of how we might construct a FMEA and maintenance strategy for a bicycle, in the Availability Work Bench™ (AWB) RCM-Cost software¹:

Click image to view larger

AWB

We can see that for the failure mode ‘chain worn’ we’ve identified an inspection task to periodically check the chain for wear to address that failure mode. We’ve specified the method to use (a wear gauge, as opposed to a simple visual check or performing a measurement) and an acceptable limit (less than 75% worn).  This is a clear communication of what is required, minimizing the chances of ineffective maintenance.

“How do I choose which task to perform?”

In the example above I touched on the point that there may be a choice of maintenance tasks that could be performed, as well as whether or not to perform any maintenance at all.  The RCM process also helps us to choose an appropriate maintenance task and it is essentially a balance between the severity of the failure vs. the cost or effort to perform the maintenance. Often severity is thought of in terms of cost e.g. lost production, but it also covers the impact on safety or operational impact. The operating context of the equipment also affects the severity. The example below shows how we use the AWB software to select an optimal maintenance task interval.

Click image to view larger

Optimization Curve Image

Imagine we only ride our bike for getting around the town we live in for non-essential tasks, such as popping to the shops to buy some milk and a newspaper. In this case a punctured tire is not critical and we might decide not to carry a spare tube and tools to change it (pump, tire levers etc.) and instead to perform ‘breakdown maintenance’ i.e. walk the bike home and repair it there.  Now if we were instead on a vacation touring a remote location, far from any nearby towns, this ‘run to fail’ strategy would result in a very long walk and clearly not be suitable!

 Hidden Failures

So assuming we were carrying a spare tube, and relying on it in remote locations, what happens if there is a problem with the spare tube? “Did I remember to fix it after my last puncture?” What if there is a manufacturing defect?” Or “what if I didn’t find the thorn that caused the first puncture still stuck in the tire and got a second puncture?” These are called ‘hidden failures’ and require failure finding tasks in order to mitigate them.

 Operator Maintenance

We might also set our bicycle maintenance strategy assuming we do all the checks at home in the garage, but do we also need to consider operating checks?  For our bike this might include using our senses to listen for any abnormal noises, rattles, looseness, creaks or squeaks when riding the bike. We are also checking the operation of the gears and brakes through use, cleaning the bicycle down after use and oiling the chain afterwards to prevent corrosion. This is an example of ‘operator maintenance’.

How do we manage failures during use? If we notice something is wrong during use that we can’t fix, we would note it and arrange some planned maintenance at the bike shop before the warning becomes an actual failure that renders the bike out of action.  For operating failures that occur with little or no warning time we can address these in a number of ways; carrying spares (e.g. a spare inner tube), or tools to repair the failure out in the field (puncture repair kit).  We can also introduce re-designs (sealant in the tire to seal holes as they occur).

So there it is, writing an effective maintenance strategy can be as easy as riding a bike.

 

¹Availability Workbench™ is authored by Isograph Ltd. ARMS Reliability are authorized global distributors, re-sellers and implementers of the software application.

Author: Ben Rowland

Surely if some is good, more is better? Like many things in life, there can be too much of a good thing when it comes to detail in an RCM study and finding the right balance can be tricky. Too little detail and you may miss things, too much and you could suffer from ‘analysis paralysis!’ B

So how do we know when we’ve ‘drilled down’ far enough to be thorough but not too far?

John Moubray summarised it nicely in his RCM 2 textbook:

“Failure Modes should be defined in enough detail for it to be possible to select a suitable failure management policy” (Moubray, 2007)

So what is a suitable failure management policy? The failure management policy is the approach chosen in order to mitigate the consequences of failure to an acceptable level.

Let’s consider two pumps; one is a large, complex gas compression pump and the other is a small air conditioning pump on a fork lift.

When trying to understand what the ‘suitable failure management policy’ is, it is necessary to take into account the ‘bigger picture’ of the equipment under consideration:

Function

What is the function of the machine? What is its purpose? Understanding this will help to understand the consequences of the failure, which in turn will help define the criticality.

Criticality

How critical is it if the failure occurs? Criticality is a product of the severity of the consequences of a failure multiplied and the frequency of occurrence.

In the case of large gas compression pump, a failure could result in product not being delivered, costing $1000’s per hour of downtime. Or for the forklift a/c pump it could be returning the forklift to be swapped for another in the fleet.

Repair vs. replace policy

Another aspect to consider is what is the corrective action? Is it feasible/cost effective to stock the spares and perform a repair activity in-situ, or to simply replace with a new unit?

For a large, expensive pump it would be more expensive to replace the entire unit than to replace a worn seal. Whereas for a small a/c pump it would be more cost effective to discard it and replace with a new one.

Hidden failure

Are the failures evident in normal operation, or do they require fault finding to be performed? Can the seals be seen to check for signs of leakage?

Operating context

How accessible is the equipment? Is scaffolding required? Is the plant required to be shut down? Does the equipment need to be partially dismantled e.g. removing guards etc? Is there any redundancy in place? Is the equipment in a remote location, or a challenging environment?

These are just some things to consider when considering what a ‘suitable failure management policy’ might be for your particular piece of equipment.

Back to our pump examples;
For the large gas compression pump, it is expensive to replace, critical if it fails and is accessible for in-situ repair during scheduled shut downs. In this case the FMEA would be far more detailed, including several failure modes, each with its own inspection or planned maintenance tasks, which would combine to form the ‘Failure Management Policy’ for this pump.

Image 1 How much detail

For the small AC pump on a forklift, let’s say it’s inaccessible for inspection, not critical if it fails and would be replaced rather than repaired. Our FMEA might only include a small number of failure modes, such as ‘Seal worn’, ‘Impellor worn’ and ‘Motor burnt out’ and our corresponding ‘Failure Management Policy’ would be ‘No scheduled maintenance’ and the corrective action would be to ‘Replace AC pump’.

Image 2 How much detail

In conclusion, it can be a challenge to know how much detail to go into when performing a FMEA analysis, but the aim is to go into enough detail to determine a suitable failure management policy. Considering the ‘bigger picture’ of the equipment you are analysing will help guide you as to the level of detail required.

Author: Kevin Stewart

At some point, most companies will want to see quantifiable metrics showing that their Root Cause Analysis (RCA) program has resulted in a positive return on investment (ROI).

ROI is relatively easy to calculate as a dollar value when it comes to tangibles such as equipment or production time. Things can seem trickier when trying to assign a dollar value to safety improvements resulting from an RCA program. Try to keep it simple.

This formula –

Cost of the Problem x Likely Recurrence / Cost of the Fix = ROI

is a straightforward way to begin quantifying the ROI of your RCA program, including its effects on safety.

Let’s look at how we might calculate these costs.

Cost of the Fix

  • Cost of an RCA investigation (you may need to include the initial training, though this should drop off as it is amoritized out over the program, as well as whatever time, resources, and people are required to conduct the investigation itself).
  • Cost of whatever resources are needed to implement a solution. Don’t forget to include new equipment, parts, additional training, and anything else that is directly attributable to the implementation.

When you eliminate a problem, calculating what you have saved depends a lot on the problem itself and what its rate of reoccurrence is. For instance, if you figure out what was causing a particular machine to fail at a rate of once/year, you won’t see the benefits of your solution for another year. It can take several years and solving many different problems to see the total value of an RCA program.

Improved safety isn’t as impossible to quantify as it might seem. While most companies don’t publicly discuss this type of equation because it can seem insensitive, chances are your company does calculate the monetary cost of an injury or death on the job. These figures may be a bit outdated, but the Mine Safety and Health Administration at the US Department of Labor offers an online calculator, which takes into account both direct costs (like workers’ comp claims) and indirect costs (like training a new worker and lower morale), as one example.

Cost of the Problem Reoccurring

Cost of the initial problem in equipment, production delays, man hours, workers’ comp claims, medical costs, absenteeism, turnover, training new employees, lower productivity, decreased morale, legal fees, increased insurance costs.

At first glance the equation doesn’t quite make sense for a safety “near miss.” If it missed then what did it cost? Is the answer nothing? So the ROI is:  0 x likely recurrence/cost of the fix = 0? The answer obviously must include the potential cost. The cost to the business if the issue was on target and hadn’t missed. It all becomes subjective then. How do you put a cost on maybes?

It might help to look at the statistics of how an incident occurs. Take the cost to the business if a single major accident occurred (every business has this unspoken cost locked away somewhere) and then very simply do the math. One near miss will be worth 0.003 of that cost. Tally up your near misses and now go back to the formula.

AccidentPyramid_V2

As an example, say your data indicates you have 3000 near misses in two years, or 4.1 incidents per day. Then you put a program in place and now you have 3000 near misses in four years, or 2.1 incidents per day. This translates to 3000 fewer near misses in two years time. Per the above calculations, this would generate 3000 x 0.003 or nine fewer major incidents at whatever cost your company assigns to that type of incident. This becomes the savings for your ROI (or the Cost of the Problem in our equation) and can be attributed to the safety program of which the RCA process is a part.

This formula will assist in calculating an ROI on an individual RCA, which is necessary to show that the process is working and providing value so you can justify the program. However, since most safety programs track TRIR (Total Recordable Injury Rate) or something to that effect, you will also need to show that the RCA program affects this, too. This will be difficult because the safety program is in place and doing other things to prevent safety incidents before they happen. How do you attribute a reduction in near misses to preventive programs versus items put in place from an RCA?

You may never be able to separate these items. Even with detailed records, it is not always clear why people do what they do. The best thing you can do is to track when an RCA program was incorporated and then show the improvement in your safety metric, in TRIR, or near misses.

You can use this information to justify the program with the argument that the RCA process is part of the overall safety program and it really doesn’t matter which gets the credit as long as we have continued to drive safety improvements. The RCA program should be a small part of the overall safety program costs since there are usually several full time safety people involved, committee meetings, safety initiatives, programs, etc.

It doesn’t matter how you slice and dice it, the return on investment for your RCA program boils down to: What will it cost me to fix the problem now? – versus – What is the cost if this problem happens again?