Category Archives: Reliability Services

Author: Jason Ballentine, VP of Engineering for ARMS Reliability

High Season Means Higher Stakes

bigstock--134880761This summer, the heat is shattering records around the United States—in Arizona, 119°F (48°C) days mean dozens of plane flights have been grounded and air conditioners are demanding an unprecedented number of megawatts from utilities. With average temperatures rising every summer and energy demand following suit, utilities have recognized the need to be more proactive about reducing their risk of outages.

Recently, the Chief Operating Officer of one energy generation company sought our help to fend off any issues that could result in a summer outage. Not only would an outage mean unhappy customers, but it would also mean financial losses if the utility couldn’t run at maximum capacity during its most lucrative season.

Throughout the winter, this utility saw a few small issues here and there. While nothing too dramatic happened, the COO recognized that he wouldn’t be able to afford something bigger going wrong during the busy season. He approached us to conduct a Vulnerability Assessment and Analysis (VAA) that would help identify his company’s most critical issues and reduce the likelihood of a service interruption.

*A VAA can be conducted on any type of operation in any industry. Learn more

Shedding Light on Potential Vulnerabilities

The analysis began with one power plant. This utility was like many other operations—they had several vulnerabilities on their radar in some form but no central repository for tracking them all. There might be a machine operator who knew about one issue, there might be an email chain about another issue, a few deferred work orders hanging around, but no way of making all issues known to all parties.

We began collecting information about the plant’s vulnerabilities—conducting individual interviews and brainstorming sessions with small groups of engineering and operating staff. We also reviewed event logs and work order histories to determine whether past events were likely to reoccur. We wanted to know: what issues had they been living with for a while? Where were they deferring maintenance? What spare parts were they missing? What workarounds were in place? Over the course of about a week, we reviewed all the vulnerabilities that could slow down or stop production on 40,000 pieces of equipment.

Concentrating on the Critical

Blank checklist on whiteboard with businessman hand drawing a reOut of this process, about 200 vulnerabilities were identified. Next, we scored each vulnerability in terms of likelihood and consequence and then ranked them “low,” “medium,” or “high” according to the corporate risk matrix. While there were about 25 vulnerabilities that we identified as being in the “high” category, we determined that 16 of them comprised approximately 80 percent of the risk to production.

If the utility focused on resolving these 16 issues first, they would see the greatest results in the shortest amount of time. We were also able to show the utility which type of vulnerability was most prevalent (wear and tear) and which systems were most in need of attention.

The final step was to assign a high-level action to each of the most critical vulnerabilities (examples might be “order spare parts” or “seek approval for design change from fire marshal”). Now the utility had a clear plan for which vulnerabilities to address first, where to begin resolving each vulnerability, who was responsible for each action item, and a recommended time frame for taking action.

Conclusion

Like most organizations, this utility wasn’t surprised by the vulnerabilities we identified. Chances are, these issues had been looming in the background making everyone somewhat uneasy due to the lack of clear prioritization or path to resolution.

Over the course of just three weeks, our Vulnerability Assessment and Analysis captured all the potential vulnerabilities, prioritized them according to criticality, and provided a clear path of action. By following this plan, the utility could dramatically reduce the chances of a catastrophic slow down or stoppage, eliminating much of the stress that usually accompanies the high season.

The utility’s COO was so pleased with the results at the first plant that he immediately scheduled a Vulnerability Assessment and Analysis for the next power plant, with plans to eventually cover them all.

It’s important to conduct a Vulnerability Assessment and Analysis before a period of high production, but it’s also a useful process in advance of a scheduled work stoppage. This way any fixes that are identified can be completed without incurring additional downtime.

Find out more about our Vulnerability Assessment and Analysis process.

Author: Jason Ballentine, VP of Engineering for ARMS Reliability

Starting From Scratch With Spares

Portrait of warehouse worker talking to supervising manager whilAs anyone with a hand in running a household knows, it’s important to keep a stockpile of key items. You certainly don’t want to find out the hard way that you’re on your last square of toilet paper. But in the case of a facility like a power plant, a missing spare part could be more than just a nuisance—it could be downright expensive.

Determining the appropriate spare parts to have on hand in a large facility, however, can be tricky. This is especially true after building a facility from the ground up, when you don’t have a frame of reference for which spare parts you’re most likely to need first.

Most organizations deal with this in one of two ways: 1) they guess or 2) they purchase according to a spares list provided by an equipment vendor.

A Reliability-Focused Purchase List

There are obvious limitations when it comes to guesswork—making the wrong guess can result in huge expenses either in unnecessary spare parts or in costly downtime. A vendor-suggested list is probably somewhat more accurate, but such suggestions are unlikely to take into account the specific needs of your organization. We approach spare part holding recommendations through the lens of reliability as it applies to each specific operation. As factors change, it’s important to re-evaluate, making sure to take into account everything that could influence purchase priorities.

Recently, a utility company approached us to review the list of spare parts their equipment vendor had recommended. According to the vendor, this utility needed to purchase $4.9 million worth of spare parts up front. The utility wanted a second opinion before making such a sizable investment.

Our Approach

We started the spare parts analysis by looking at the list provided by the equipment vendor, but then we dug much deeper. We explored a series of questions, including: How often is this part likely to fail? What is the cost of the downtime if the part is attached to a critical piece of equipment? What is the unit cost of the spare part? What is the lead time to obtain a spare? Is this part likely to fail at any time throughout its lifecycle, or is it only likely to fail at the end of its life? There is no point in purchasing a spare today if you are unlikely to need it for another 20 years.

In all, about 1,500 pieces of equipment were reviewed over 40 days before providing a recommended list of spares. The final list included some of what the vendor had recommended, left off many of the vendor’s recommended parts, and suggested a few additional parts that weren’t in the original list.

The final critical spares list that was recommended included a total of $2.2 million in spare parts—a savings of $2.7 million over what the vendor had originally recommended.

Built to Adapt

Saving money with ARMSOur recommended spares list is intended to be responsive to changing needs and new information. When the utility took a second look at its downtime cost and calculated that it was actually $10/megawatt and not the $23/megawatt they had initially determined, we re-evaluated the spares list, reducing the utility’s recommended purchases by another $200,000.

Conclusion

If your organization is like most, you probably run into trouble when it comes to having the right spares on hand. Either you’re missing the right parts when something breaks down, or you have expensive spares gathering dust and potentially going bad in storage. ARMS Reliability takes the guesswork out of developing a critical spares list, taking into account item costs, the likelihood of failure, lead times, downtime costs, and all other relevant factors.

The investment this utility made to conduct the analysis with our help ultimately reduced their bottom line equipment costs by $2.7 million—which represented a savings of 50 to 1. Beyond the monetary benefit, the utility’s Reliability Engineer felt much more confident in the approach taken. He was also relieved to avoid grossly overspending on spares.

Find out more about ARMS Reliability’s Spare Part Holding Analysis

Author: Jason Ballentine

Developing a maintenance strategy requires careful consideration and due process. Yet from what I’ve seen, many organizations are making obvious errors right from the start — missteps that can torpedo the success of the strategies they’re trying so hard to put in place.??????????????????????????????????????????

Without further ado, here are five common maintenance strategy mistakes:

  1. Relying solely on original equipment manufacturer (OEM) or vendor recommendations.

It seems like a good idea — you’d think the people who made or sold the equipment would know best. It’s what they don’t know that can hurt you.

Outside parties don’t know how a piece of equipment functions at your facility. They don’t understand how much this equipment is needed, the cost of failure, whether there’s any redundancy within the system… OEM and vendor maintenance guidelines are geared to maximize the availability and reliability of the machine, but their strategies might not be appropriate for your unique circumstances or needs. As a result, your team could end up over-maintaining the equipment, which can actually create more problems than it solves. The more you mess with a piece of equipment, the more you introduce the possibility of error or failure. Some things, in some situations, are better left alone.

What’s more, OEMs and vendors have a vested interest in selling more spare parts (so they can make more money). That means that their replacement windows might not be accurate or appropriate to your business needs. Rather than relying on calendar-driven replacement, your maintenance strategy might focus more on inspecting the equipment to proactively identify any issues or deterioration, then repairing or replacing only as needed.

It’s fine to use OEM/vendor maintenance guidelines as a starting point. Just make sure you thoroughly review their recommendations to see if they align with your unique needs for the given piece of equipment. Don’t just blindly accept them — make sure they fit first.

  1. Relying heavily on generic task libraries for your maintenance strategy.

This is surprisingly common. Some organizations purchase a very generic set of activities for a piece of equipment or equipment category, and attempt to use them to drive maintenance strategy. But generic libraries are even worse than OEM/vendor recommendations because they are just that — generic. They aren’t written for the specific equipment make and model you have. They might even include tasks that simply don’t apply, such as “inspect the belt” on a pump that uses an entirely different drive mechanism. Once a mechanic attempts to perform one of these generic, ill-suited tasks, he or she stops trusting your overall maintenance strategy. Without credibility and compliance, you might as well not have a strategy at all.

Like OEM and vendor recommendations, generic task libraries can help you get started on a robust maintenance strategy, if (and only if) you carefully examine them first and only use the tasks that make sense for your particular equipment and operational needs.

  1. Failing to include a criticality assessment in your strategy decisions.

If you choose and define tasks without factoring in criticality, you run the risk of wasted effort and faulty maintenance. Think about it: If a piece of equipment is low on the criticality scale, you might be okay to accept a generic strategy and be done with it. But for equipment that’s highly critical to the success of your operations, you need to capture as much detail as possible when selecting and defining tasks. How can you know which is which without fully assessing the relative importance of each piece of equipment (or group of equipment) to the overall performance of your site?

  1. Developing maintenance strategies in a vacuum.

Sometimes, organizations will hire an outside consultant to develop maintenance strategies and send them off to do it, with no input from or connection with the maintenance team (or the broader parts of the organization). Perhaps they figure, “you’re the expert, you figure it out.” Here’s the problem: For a maintenance strategy to be successful, it must be developed within the big picture. You’ve got to talk to the mechanic who’ll be doing the work, the planner for that work, and the reliability engineer who’ll be responsible for the performance of that equipment, production, or operation. Their input is extremely valuable, and their buy-in is absolutely critical. Without it, even the best maintenance strategy can be met with resistance and non-compliance.

  1. Thinking of maintenance strategy development as a “one-and-done” effort.

For some organizations, the process of developing a maintenance strategy from the ground up seems like something you do once and just move on. But things change — your business needs change, the equipment you have on site changes, personnel changes, and much more. That’s why it’s vitally important to keep your maintenance strategies aligned with the current state of your operations.

In fact, a good maintenance strategy is built with the idea of future revisions in mind. That means the strategy includes clear-cut plans for revisiting and optimizing the strategy periodically. A good strategy is also designed to make those revisions as easy as possible by capturing all of the knowledge that went into your strategy decisions. Don’t just use Microsoft Word or put tasks directly into the system without documenting the basis for the decisions you made. What were your considerations? How did you evaluate them? What ultimately swayed your decision? In the future, if the key factors or circumstances change, you’ll be able to evaluate those decisions more clearly, without having to guess or rely on shaky recall.

If you’ve found yourself making any of these mistakes, don’t despair. Most errors and missteps can be addressed with an optimization project. In fact, ARMS Reliability specializes in helping organizations make the most of their maintenance strategies. Contact us to learn more.

Engineering Support_Web Banner

Author: Jason Ballentine

Many organizations believe that making sound maintenance decisions requires a whole lot of data. It’s a logical assumption — you do need to know things like the number of times an event has occurred, its duration, the number of spare parts needed, and the number of people engaged in addressing the event; plus the impact on the business and the reason why it happened. ????????????????????????????????????????

A lot of this information is captured in your Computerized Maintenance Management System (CMMS). The more detail you have, the more accurate results you can get from maintenance scenario simulation tools like Isograph’s Availability Workbench™. Unfortunately, your CMMS data may be lacking enough detail to yield optimal results.

It’s enough to make anybody want to throw his or her hands up and put off the decision indefinitely. If you do, you could be making a big mistake.

No matter what, you’re still going to have to make a decision. You have to.

The truth is, you can still do a lot with limited or poor quality data, supported by additional sources of knowledge. Extract any and all information you have available, not just what is in the CMMS. Document what you’ve got, then use it to make a timely decision that’s as informed as possible.

Don’t get caught up in the fact that it’s not perfect data — circumstances in the real world are hardly ever ideal. In fact, as reliability engineers, most of the data we get is related to failure, which is exactly what we’re trying to avoid. Actually, if we are tracking failures, having less data means we are likely doing our jobs well because that means we are experiencing a low number of failures.

The bottom line is: we can’t afford to sit and wait for more data to make decisions, and neither can you.

Gather as much information as you can from all available sources:

CMMS

In an ideal world, this is the master data record of all activities performed.  As discussed previously, that is almost never the case; however, this is an important starting point to reveal where data gaps exist.

Personal experience and expertise

There’s a wealth of information stored within the experience of people who are familiar with any given piece of equipment. Consider holding a facilitated workshop to gather insight on the equipment’s likely performance. Even a series of informal conversations can yield useful opinions and real-world experiences.

The Original Equipment Manufacturer (OEM)

Most OEMs will have documentation you can access, possibly also a user forum you can mine for additional information.

Industry databasese.g., the Offshore and Onshore Reliability Data Handbook (OREDA) and Process Equipment Reliability Database (PERD) by Center for Chemical Process Safety (CCPS)

Some information is available in these databases, but it’s generic — not specific to your unique site or operating context. For example, you can find out how often a certain type of pump fails, but you can’t discover whether that pump is being used on an oil platform, refinery, power station or mine site. Industry data does, however, provide useful estimates on which you can base your calculations and test your assumptions.

Capture all these insights in an easily accessible way, then use what you’ve learned to make the best decision currently possible. And be sure to record the basis for your decision for future reference. If you get better data down the road, you can always go back and revise your decisions — after all, most maintenance strategies should remain dynamic by design.

Don’t let a lack of data paralyze you into inaction. Gather what you can, make a decision, see how it works, and repeat. It’s a process of continuous improvement, which given the right framework is simple and efficient.

Availability Workbench™, Reliability Workbench™, FaultTree+™, and Hazop+™ are trademarks of Isograph Limited the author and owner of products bearing these marks. ARMS Reliability is an authorized distributor of those products, and a trainer in respect of their use.

Reliability Summit_Blog_Web Banner

Author: Jason Ballentine

As with any budget, you’ve only got a certain amount of money to spend on maintenance in the coming year. How do you make better decisions so you can spend that budget wisely and get maximum performance out of your facility? ??????????????????????????????????????????????

It is possible to be strategic about allocating funds if you understand the relative risk and value of different approaches. As a result, you can get more bang for the same bucks.

How can you make better budget decisions?

It can be tempting to just “go with your gut” on these things. However, by taking a systematic approach to budget allocation, you’ll make smarter decisions — and more importantly you’ll have concrete rationales for why you made those decisions —  which can be improved over time. Work to identify the specific pieces of equipment (or types of equipment) that are most critical to your business, then compare the costs and risks of letting that equipment run to failure against the costs and risks of performing proactive maintenance on that equipment. Let’s take a closer look at how you can do that.

4 steps to maximize your maintenance budget

1.  Assign a criticality level for each piece of equipment. Generally, this is going to result in a list of equipment that would cause the most pain — be it financial, production loss, safety, or environmental pain — in the event of failure. Perform a Pareto analysis for maximum detail. 

2.  For your most critical equipment, calculate the ramifications of a reactive/run-to-failure approach.

  • Quantify the relative risk of failure. (You can use the RCMCost™ module of Isograph’s Availability Workbench™ to better understand the risk of different failure modes.)
  • Quantify the costs of failure. Keep in mind that equipment failures can affect multiple aspects of your business in different ways — not just direct hard costs. In every case, consider all possible negative effects, including potential risks.
    • Maintenance: Staff utilization, spare parts logistics, equipment damage, etc.
    • Production Impact: Downtime, shipment delays, stock depletion or out-of-stock, rejected/reworked product, etc.
    • Environmental Health & Safety (EHS) Impact: Injuries, actual/potential releases to the environment, EPA visits/fines, etc.
    • Business Impact: Lost revenue, brand damage, regulatory issues, etc.

For a more detailed explanation of the various potential costs of failure, consult our eBook, Building a Business Case for Maintenance Strategy Optimization.

3.  Next, calculate the impact of a proactive maintenance approach for this equipment

  • Outline the tasks that would best mitigate existing and potential failure modes
  • Evaluate the cost of performing those tasks, based on the staff time and resources required to complete them.
  • Specify any risks associated with the proactive maintenance tasks. These risks could include the possibility of equipment damage during the maintenance task, induced failures, and/or infant mortality for newly replaced or reinstalled parts.

4. Compare the relative risk costs between these approaches for each maintenance activity. This will show you where to focus your maintenance budget for maximum return.

When is proactive maintenance not the best plan?

For the most part, you’ll want to allocate more of your budget towards proactive maintenance for equipment that has the highest risk and the greatest potential negative impact in the event of failure. Proactive work is more efficient so your team can get more done for the same dollar value. Letting an item run to failure can create an “all hands on deck” scenario under which nothing else gets done, whereas many proactive tasks can be performed quickly and possibly even concurrently.

That said, it’s absolutely true that sometimes run-to-failure is the most appropriate approach for even a critical piece of equipment. For example, a maintenance team might have a scheduled task to replace a component after five years, but the problem is that component doesn’t really age -— the only known failure mode is getting struck by lightning. No matter how old that component is, the risk is the same. Performing replacement maintenance on this type of component might actually cost more than simply letting it run until it fails. (In these cases, a proactive strategy would focus on minimizing the impact of a failure event by adding redundancy or stocking spares.) But you can’t know that without quantifying the probability and cost of failure.

Side note: Performing this analysis can help you see where your maintenance budget could be reduced without a dramatic negative effect on performance or availability. Alternatively, this analysis can help you demonstrate the likely impact of a forced budget reduction. This can be very helpful in the event of budget pressure coming down from above.   

At ARMS Reliability, we help organizations understand how to forecast, justify and prioritize their maintenance budgets for the best possible chances of success. Contact us to learn more.

Availability Workbench™, Reliability Workbench™, FaultTree+™, and Hazop+™ are trademarks of Isograph Limited the author and owner of products bearing these marks. ARMS Reliability is an authorised distributor of those products, and a trainer in respect of their use.

Author: Dan DeGrendel

Regardless of industry or discipline, we can probably all agree that routine maintenance — sometimes referred to as preventative, predictive, or even scheduled maintenance — is a good thing. Unfortunately, through the years I’ve found that most companies don’t have the robust strategies they need.

Typical issues and the kinds of trouble they can create:

service engineer worker at industrial compressor refrigeration s1. Lack of structure and schedule

In many cases, routine tasks are just entries on a to-do list of work that needs to be performed — with nothing within the work pack to drive compliance. In particular, a list of tasks beginning with “Check” which have no guidance of an acceptable limit can have limited value. The result can be a “tick and flick” style routine maintenance program that fails to identify impending failure warning conditions.

2. Similar assets, similar duty, different strategies

Oftentimes, maintenance views each piece of equipment as a standalone object, with its own unique maintenance strategy. As a result, one organization could have dozens of maintenance strategies to manage, eating up time and resources. In extreme cases, this can lead to similar assets having completely different recorded failure mechanisms and routine tasks, worded differently, grouped differently and structured differently within the CMMS.

3. Operational focus 

Operations might be reluctant to take equipment out of service for maintenance, so they delay or even cancel the appropriate scheduled maintenance. At times this decision is driven by the thought that the repair activity is the same in a planned or reactive manner. But experience tells us that without maintenance, the risk is even longer downtime and more expensive repairs when something fails.

4. Reactive routines

Sometimes, when an organization has been burned in the past by a preventable failure, they overcompensate by performing maintenance tasks more often than necessary. The problem is, the team might be wasting time doing unnecessary work — worse still it might even increase the likelihood of future problems, simply because unnecessary intrusive maintenance can increase the risk of failure.

5. Over-reliance on past experience 

There’s no substitute for direct experience and expertise. But when tasks and frequencies are too solely based on opinions and “what we’ve always done” — rather than sound assumptions — maintenance teams can run into trouble through either over or under maintaining. Without documented assumptions, business decisions are based on little more than a hunch. “Doing what we’ve always done” might not be the right approach for the current equipment, with the current duty, in the current business environment (and it certainly makes future review difficult).

6. Failure to address infrequent but high consequence failures 

Naturally, routine tasks account for the most common failure modes. They should however also address failures that happen less frequently, but may have a significant impact on the business. Developing a maintenance plan which addresses both types, prevents unnecessary risk. For example, a bearing may be set up on a lubrication schedule, but if there’s no plan to detect performance degradations due to a lubrication deficiency, misalignment, material defect, etc then undetected high consequence failures can occur.

7. Inadequate task instructions

Developing maintenance guidelines and best practices takes time and effort. Yet, all too often, the maintenance organization fails to capture all that hard-won knowledge by creating clear, detailed instructions. Instead, they fall back on the maintenance person’s knowledge — only to lose it when a person leaves the team. Over time, incomplete instructions can lead to poorly executed, “bandaid-style” tasks that get worse as the months go by.

8. Assuming new equipment will operate without failure for a period of time

There’s a unique situation that often occurs when new equipment is brought online. Maintenance teams assume they have to operate the new equipment first to see how it fails before they can identify and create the appropriate maintenance tasks. It’s easy to overlook the fact that they likely have similar equipment with similar points of failure. Their data from related equipment provides a basic foundation for constructing effective routine maintenance.

9. Missing opportunity to improve

If completed tasks aren’t reviewed regularly to gather feedback on instructions, tools needed, spare parts needed, and frequency; the maintenance process never gets better. The quality or effectiveness of the tasks then degrade over time and, with it, so does the equipment.

10. Doing what we can and not what we should 

Too often, maintenance teams decide which tasks to perform based on their present skill sets — rather than equipment requirements. Technical competency gaps can be addressed with a training plan and/or new hires, as necessary, but the tasks should be driven by what the equipment needs.

Without a robust routine maintenance plan, you’re nearly always in reactive mode — conducting ad-hoc maintenance that takes more time, uses more resources, and could incur more downtime than simply taking care of things more proactively. What’s worse, it’s a vicious cycle. The more time maintenance personnel spend fighting fires, the more their morale, productivity, and budget erodes. The less effective routine work that is performed, the more equipment uptime and business profitability suffer.  At a certain point, it takes a herculean effort simply to regain stability and prevent further performance declines.

Here’s the good news: An optimized maintenance strategy, constructed with the right structure is simpler and easier to sustain. By fine-tuning your approach, you make sure your team is executing the right number and type of maintenance tasks, at the right intervals, in the right way, using an appropriate amount of resources and spare parts. And with a framework for continuous improvement, you can ultimately drive towards higher reliability, availability and more efficient use of your production equipment.

Want to learn more? Check out our next blog in this series, Plans Can Always Be Improved:  Top 5 Reasons to Optimize Your Maintenance Strategy.

rel101_web-banner

Author: Dan DeGrendel

Maintenance optimization doesn’t have to be time-consuming or difficult. Really it doesn’t. Yet many organizations simply can’t get their maintenance teams out of a reactive “firefighting mode” so they can focus on improving their overall maintenance strategy. Development And Growth

Stepping back to evaluate and optimize does take time and resources, which is why some organizations struggle to justify the project. They lack the data and/or the framework to demonstrate the real, concrete business value that can be gained.

And even when organizations do start to work on optimization, sometimes their efforts stall when priorities shift, results are not immediate and the overall objectives fade from sight.

If any of these challenges sound familiar, there are some very convincing reasons to forge ahead with maintenance optimization:

1. You can make sure every maintenance task adds value to the business

Through the optimization process, you can eliminate redundant and unnecessary maintenance activities, and make sure your team is focused on what’s really important. You’ll outline the proper maintenance tasks, schedules and personnel assignments; then incorporate everything into the overall equipment utilization schedule and departmental plans to help drive compliance. Over time, an optimized maintenance strategy will save time and resources — including reducing the hidden costs of insufficient maintenance (production downtime, scrap product, risks to personnel or equipment and expediting and warehousing of spare parts, etc.).

2. You’ll be able to plan better

Through the optimization process, you’ll be allocating resources to various tasks and scheduling them throughout the year. This gives you the ability to forecast resource needs, by trade, along with spare parts and outside services. It also helps you create plans for training and personnel development based on concrete needs.

3. You’ll have a solid framework for a realistic maintenance budget

The plans you establish through the optimization process give you a real-world outline of what’s needed in your maintenance department, why it’s needed, and how it will impact your organization. You can use this framework to establish a realistic budget with strong supporting rationales to help you get it approved. Any challenges to the budget can be assessed and a response prepared to indicate the impact on performance that any changes might make.

4. You’ll just keep improving

Optimization is a project that turns into an ongoing cycle of performing tasks, collecting feedback and data, reviewing performance, and tweaking maintenance strategies based on current performance and business drivers.

5. You’ll help the whole business be more productive and profitable

Better maintenance strategies keep your production equipment aligned to performance requirements, with fewer interruptions. That means people can get more done, more of the time. That’s the whole point, isn’t it?

Hopefully, this article has convinced you of the benefits of optimizing your maintenance strategies. Ready to get started or re-energize your maintenance optimization project? Check out our next blog article, How To Optimize Your Maintenance Strategy: A 1,000-Foot View.

rcm201_web-banner

Author: Dan DeGrendel

Optimizing your maintenance strategy doesn’t have to be a huge undertaking. The key is to follow core steps and best practices using a structured approach. If you’re struggling to improve your maintenance strategy — or just want to make sure you’ve checked all the boxes — here’s a 1000-foot view of the process.

1. Sync up

  • Identify key stakeholders from maintenance, engineering, production, and operations — plus the actual hands-on members of your optimization team.
  • Get everybody on board with the process and trained in the steps you’re planning to take.  A mix of short awareness sessions and detailed educations sessions to the right people are vital for success.
  • Make sure you fully understand how your optimized maintenance strategies will be loaded and executed from your Computer Maintenance Management System (CMMS)

2. Organize

  • Review/revise the site’s asset hierarchy for accuracy and completeness. Standardize the structure if possible.
  • Gather all relevant information for each piece of equipment.
    • Empirical data sources: CMMS, FMEA (Failure Mode and Effects Analysis) studies, industry standards, OEM recommended maintenance
    • Qualitative data sources: Team knowledge and past records

3. Prioritize

  • Assign a criticality level for each piece of equipment; align this to any existing risk management framework
  • Consider performing a Pareto analysis to identify equipment causing the most production downtime, highest maintenance costs, etc.
  • Determine the level of analysis to perform on each resulting criticality level

4. Strategize

  • Using the information you’ve gathered, define the failure modes, or apply an existing library template. Determine existing and potential modes for each piece of equipment
  • Assign tasks to mitigate the failure modes.
  • Assign resources to each task (e.g, the time, number of mechanics, tools, spare parts needed, etc.)
  • Compare various options to determine the most cost-effective strategy
  • Bundle selected activities to develop an ideal maintenance task schedule (considering shutdown opportunities). Use standard grouping rules if available.

This is your proposed new maintenance strategy.

5. Re-sync

  • Review the proposed maintenance strategy with the stakeholders you identified above, then get their buy-in and/or feedback (and adjust as needed)

6. Go!

  • Implement the approved maintenance strategy by loading all of the associated tasks into your CMMS — ideally through direct integration with your RCM simulation software, manually, or via Excel sheet loader.

7. Keep getting better

  • Continue to collect information from work orders and other empirical and qualitative data sources.
  • Periodically review maintenance tasks so you can make continual improvements.
  • Monitor equipment maintenance activity for unanticipated defects, new equipment and changing plant conditions. Update your maintenance strategy accordingly.
  • Build a library of maintenance strategies for your equipment.
  • Take what you’ve learned and the strategies and best practices you’ve developed and share them across the entire organization, wherever they are relevant.

Of course, this list provides only a very high-level view of the optimization process.

If you’re looking for support in optimizing your maintenance strategies, or want to understand how to drive ongoing optimization, ARMS Reliability is here to help.

rbd201_web-banner

Author: Philip Sage, CMRP, CRL

Traditionally, SAP is populated with Master Data with no real consideration of future reliability improvement. Only once that maintenance is actually being executed does the real pressure of any under performing assets drive the consideration of the reliability strategy. At that point the mechanics of what’s required for ongoing reliability improvement, based upon the SAP Master Data structure, is exposed and, quite typically, almost unviable. ???????????????????????????????????????????????????????????????????????????

The EAM system is meant to support reliability. Getting your EAM system to support reliability requires some firm understanding of what must happen. If we look a little closer at reliability and the phases of life of an asset, we can see why the EAM settings must vary and not be fixed.

The initial reliability performance of any system is actually determined by its design and component selection.

This is probably not a big surprise for anyone close to reliability, but it may spark some debate from those who have not heard this before.

As evidence to support this statement, a newly commissioned and debugged system should operate nearly failure free for an initial period of time and only become affected by chance failures on some components. An even closer inspection can show that during this period, we can expect that most wear out failures would be absent after a new machine or system is placed into service. During this “honeymoon period” preventative replacement is actually not necessary nor would an inspection strategy provide benefit until such time as wear (or unpredictable wear) raises the possibility of a failure. Within this honeymoon period the components of the system behave exponentially and fail due to their individual chance failures only. They should only be replaced if they actually fail and not because of some schedule. Minor lubrication or service might be required, but during this initial period, the system is predominantly maintenance free and largely free from failure.

Here is where the first hurdle occurs.

After the initial period of service has passed, then it is reasonable to expect both predictable and unpredictable forms of wear out failures to gradually occur and increase in rate, as more components reach their first wear out time.

Now if repair maintenance (fixing failures) is the only strategy practiced, then the system failure rate would be driven by the sporadic arrivals of the component wear out failures, which will predictably rise rather drastically, then fluctuate wildly resulting in “good” days followed by “bad” days. The system failure rate driven by component wear out failures, would finally settle to a comparatively high random failure rate, predominantly caused by the wear out of components then occurring in an asynchronous manner.

With a practice heavily dependent upon repair maintenance, the strength of the storeroom becomes critical, as it makes or breaks the system availability which can only be maintained by fast and efficient firefighting repairs. The speed at which corrective repairs can be actioned and the logistical delays encountered, drive the system availability performance.

From this environment, “maintenance heroes” are born.

As the initial honeymoon period passes, the overall reliability the system becomes a function of the maintenance policy, i.e. the overhaul, parts replacement, and inspection schedules.

The primary role of the EAM is to manage these schedules.

The reduction or elimination of predictable failures is meant to be managed through preventative maintenance tasks, housed inside the EAM that counter wear out failures. Scheduled inspections help to counter the unpredictable failure patterns of other components.

If the EAM is properly configured for reliability, there is a tremendous difference in the reliability of a system. The system reliability becomes a function of whether or not preventative maintenance is practiced or “only run to failure then repair” maintenance is practiced. As a hint: the industry wide belief is that some form of preventative practice is better than none at all.

Preventative maintenance is defined as the practice that prevents the wear failure by preemptively replacing, discarding or performing an overhaul to “prevent” failure.  For long life systems the concept revolves around making a minimal repair that is made by replacement of the failed component, and resulting in the system then restored to service in “like new” condition. Repair maintenance was defined as a strategy that waits until the component in the system fails during the system’s operation.

If the EAM is not programmed correctly or if the preventative tasks are not actioned, then the reliability of a system can fall to ridiculously low levels, where random failures of components of the recoverable system, plague the performance and start the death spiral into full reactive maintenance.

This is quite costly, as in order to be marginally effective the additional requirement is a fully stocked storeroom, which raises the inventory carry costs. Without a well-stocked storeroom, there are additional logistical delays associated with each component, that are additive in their impact on the system availability, and the system uptime, and so system availability becomes a function of spare parts.

An ounce of prevention goes a long way.

Perhaps everything should be put on a PM schedule…? This is actually the old school approach, and I find it still exists in practice all over the world.

The reliability of a system is an unknown hazard and is affected by the relative timing of the preventative task. This timing comes from the EAM in the form of a work order which is supposed to be generated relative to the wear out of the component. How well this task aligns with reality is quite important. If the preventative work order produced by the EAM system comes out at the wrong time, there is a direct adverse effect on system reliability.

EAM systems are particularly good at forecasting the due date of the next work order and creating a work order to combat a component wear out failure. However, wear is not always easily predicted by the EAM and so we see in practice, that not all EAM generated work orders suppress the wear out failures. One reason for this variance is the EAM system work order was produced based on the system calendar time base along with a programmed periodicity that was established in the past to predict the future wear performance.

We don’t always get this right.

As a result we generate work orders for work that is not required, or work that should have been performed before the component failed, not just after the component failed.

Maybe this sounds familiar?

Calendar based forecasts assume wear is constant with time. It is not.

A metric based in operating hours is often a more complete and precise predictor of a future failure. It’s true most EAM systems today allow predictable work to be actioned and released by either calendar time or operating hours and allow other types of time indexed counters to trigger PM work orders.

A key to success is producing the work order just ahead of the period of increased risk to failure due to wear. Whether by calendar or some other counter we call the anticipation of failure, and the work order to combat it, the traditional view of maintenance.

graph

This all sounds simple enough.

The basic job of a reliability engineer is to figure out when something will likely fail based on its past performance and schedule a repair or part change. The EAM functionality is used to produce a work order ahead of the failure, and if that work is performed on-time, we should then operate the system with high reliability.

The reliability side of this conjecture, when combined with an EAM to support, is problematic.

If the work order is either ill-timed from the EAM or not performed on time during the maintenance work execution, there is an increased finite probability that the preventative task will not succeed in its purpose to prevent a failure. Equally devastating, if the PM schedule is poorly aligned or poorly actioned, the general result mirrors the performance expected from a repair maintenance policy, and the system can decay into a ridiculously low level of reliability, with near constant sporadic wear out of one of the many components within the system.

When preventative maintenance is properly practiced so that it embraces all components known to be subject to wear out, a repairable system can operate at high reliability and availability with a very low “pure chance” failure rate and do so for indefinitely long periods of time.

Determining what to put into the EAM is really where the game begins.

FIND OUT MORE AT:

MASTERING ENTERPRISE ASSET MANAGEMENT WITH SAP, 23-26 October 0216, Crown Promenade, Melbourne

Phil Sage will be running a full day workshop “Using SAP with Centralised Planning to Continually Improve RCM Derived Maintenance Strategies” Wednesday 26 October

Come learn what works, and what does not work, as you integrate SAP EAM to support your reliability and excellence initiatives, which are needed to be best in class in asset management. The workshop covers how and where these tools fit into an integrated SAP framework, what is required to make the process work, and the key links between reliability excellence, failure management and work execution using SAP PM.

This question came up during one of our most recent webinars and we thought it raised a very interesting point. Joel Smeby is an experienced reliability engineer who leads our North American engineering team and has helped implement reliability initiatives in many different organizations across a variety of industries. ???????????????????????????????????????????

Here is what Joel had to say about the role of a reliability team as it relates to calculating the cost of downtime:

Reliability is typically not directly responsible for production. But when you look at all of the different areas within an organization (purchasing, spare parts, warehouse, operations, maintenance, safety), Reliability is the one area that should stand across all of them.  The organizational structure may not necessarily be set up in that way, but in terms of being able to talk to people in maintenance, operations, or purchasing and leverage all of that information into a detailed analysis and then make decisions at that level – I think it is Reliability that needs to do that.

I recently worked on a site and went to the operations department to validate their cost of downtime and they weren’t able to give us a solid number. It changed from day to day or week to week and from an organizational perspective it’s very difficult to make decisions based on data when you haven’t defined that number.  As Reliability Engineers we need that downtime number to justify holding spare parts or performing preventive/predictive maintenance tasks.  If Operations has not defined that then I think that a Reliability Engineer is the perfect person to facilitate that discussion.  It can sometimes be a difficult conversation to have, especially if you’re gathering the information from people in upper management.  One strategy is to help people understand why you’re gathering that information and how it will be used.  Justifying maintenance and reliability decisions is all about balancing the cost of performing maintenance against the cost of downtime in order to get the lowest overall cost of ownership.  The managers who have a budget responsibility that includes both maintenance and operations will typically appreciate this approach in finding the lowest cost to the organization.

Some organizations are able to determine the cost of downtime as a $/hour.  This is done in the most basic sense by taking the annual profit that the equipment is responsible for and dividing by the number of hours the equipment runs each year (8,760 hours for continuous operation).  A deeper level of analysis may be required in more complex operations such as batch processes.

The traditional view of a maintenance strategy is that the level of effort put in to preventing a failure is dependent on the type and size of equipment.  The reliability based approach understands the cost of downtime, and therefore the equipment’s importance.  This enables the maintenance strategy to be optimized to the overall lowest cost for the organization.

Join the conversation in our reliability discussion group on LinkedIn