News & Insights
Rethinking maintenance: the case for reliability-centered strategies
Learn how Reliability-Centered Maintenance helps industrial plants prioritize critical assets, reduce emergency repairs, and optimize maintenance budgets.
Heavy industrial plants face competing pressures that seem impossible to reconcile. They can’t afford unplanned downtime that halts production and misses delivery commitments, but they also can’t afford to maintain every piece of equipment with equal intensity. Traditional time-based maintenance schedules treat all assets equally, which often wastes resources on low-impact equipment while potentially under-maintaining critical systems.
How do facilities break out of reactive maintenance patterns and start making strategic choices about where maintenance resources deliver the most value? Reliability-centered maintenance (RCM) may hold the key.
The reactive maintenance trap
Many facilities operate in firefighting mode, with maintenance teams responding to breakdowns rather than preventing them. Teams focus on maintaining operations, responding to equipment failures as they occur. Equipment fails, and they fix it. It fails again weeks later, and they fix it again. The pattern repeats, and with all attention focused on delivery, there is little time to understand why the failures keep happening or whether preventing them makes economic sense.
This reactive culture stems from several reinforcing problems. Many facilities lack structured asset management approaches, failing to record failure data and analyze patterns to identify trends or root causes.
Time-based preventive maintenance programs may sound responsible, but they often create inefficiencies. Every maintenance action incurs costs in labor, parts, and downtime, and disassembling equipment risks introducing new problems through reassembly errors or disturbing components that were functioning properly.
The alternative extreme proves equally problematic. Facilities that defer maintenance until failure sacrifice the ability to plan, triggering emergency repairs at premium overtime rates while production schedules get disrupted. Industrial facilities face these common challenges against a backdrop of aging infrastructure and constrained resources, as many plants include equipment that has operated for 50 years, while budgets remain flat. The combination creates an impossible situation under traditional maintenance approaches.
Beyond OEM maintenance: The value of RCM’s tailored approach
Adding to the reactive maintenance trap, many well-meaning maintenance teams heavily depend on original equipment manufacturer (OEM) maintenance plans. Sometimes, they may follow them precisely, while at other times, they selectively choose specific tasks without truly understanding their own equipment in its operating context.
While OEM recommendations offer a useful starting point, they are intended for general use, rather than being tailored to the unique conditions and mission-specific needs of a particular facility. This approach overlooks important factors, such as actual equipment usage, the operational environment, and the specific stresses that assets endure onsite.
Reliability-Centered Maintenance (RCM) goes beyond these generic templates by systematically identifying all potential failure modes—not just those listed by the OEM. RCM considers industry-specific, environmental, and operational factors that could cause failures unique to a facility’s context. Through this process, organizations can determine which maintenance tasks are truly necessary, identify special inspections needed for certain conditions, and optimize the frequency and scope of preventive maintenance (PM) for each asset.
RCM’s flexibility is especially valuable when managing identical assets deployed in different parts of the organization under varying conditions or with different mission requirements. It enables teams to create asset-specific job plans, allowing them to accurately tailor their maintenance strategies. This means that some assets may require more intensive PM, while others may need less. Non-value-added tasks can be eliminated altogether, resulting in a maintenance program that fits real-world needs, boosts reliability, and optimizes limited resources.
How reliability-centered maintenance changes the approach
RCM offers a fundamentally different approach to managing maintenance resources. Instead of maintaining equipment based on arbitrary time intervals or reacting to whatever breaks next, RCM focuses on understanding which assets are most critical to operations, tailoring maintenance intensity to match their importance. It’s right-sizing maintenance for each asset based on the consequences of failure.
Understanding how equipment fails is just as important as knowing which equipment is critical. Different failure modes call for different maintenance strategies. A motor bearing that gradually degrades benefits from vibration monitoring, while an electrical component that fails suddenly might be better suited for run-to-failure. By analyzing failure modes and their consequences, facilities can select the appropriate maintenance approach for each asset’s actual behavior.
This approach requires moving beyond the assumption that all preventive maintenance adds value. RCM asks facilities to systematically evaluate risk and consequence across the facility.
Sort assets by criticality
The first step in RCM involves evaluating every significant asset based on what its failure would mean to the facility. This criticality assessment considers multiple factors: potential safety impacts, risks to the product, environmental consequences, effects on core operations, costs of downtime, and replacement expenses.
This analysis typically sorts assets into tiers. At the highest tier sit systems where failures create serious safety hazards or shut down critical operations. For example, HVAC systems serving clean rooms, primary production equipment, or utilities feeding mission-critical processes. Middle tiers include support systems with some redundancy where failures cause manageable disruptions. Lower tiers cover assets where failures create minimal operational impact, are maintained at basic levels, or are even allowed to run until they fail.
When someone expresses concern about letting equipment run to failure, a simple comparison helps: if a bathroom exhaust fan fails, it’s usually just inconvenient. However, if a machine shop exhaust fan fails, safety is compromised. The former is suitable for run-to-failure strategies; the latter is not.
Match maintenance strategy to asset criticality
After assets are classified by their criticality, maintenance plans can be customized accordingly. For the most vital equipment, facilities may adopt comprehensive monitoring and preventive maintenance practices using condition sensors and routine inspections to detect issues early. They adhere to detailed maintenance routines at optimal intervals, keep critical spare parts in stock, and conduct root cause analyses when failures happen despite these measures to address underlying problems.
Equipment in middle tiers receive less frequent preventive maintenance and condition monitoring. The goal shifts from preventing every possible failure to managing risk at acceptable levels. Spare parts inventories are leaner. Maintenance procedures may be simplified.
For lower-tier assets, maintenance becomes minimal or even reactive. Running equipment to failure makes economic sense when the asset is inexpensive, its failure does not significantly affect operations, and maintenance costs more than replacement. This isn’t neglect. It’s a strategic choice to preserve resources for equipment where maintenance matters most.
This tiered approach has proven effective in demanding environments. For example, NASA implemented equipment asset criticality assessments across all centers, aligning maintenance strategies to asset importance and achieving significant cost avoidance while improving reliability on mission-critical systems.
Start without shutting down
Facilities that operate around the clock often assume they can’t implement RCM without extended shutdowns for assessment and planning. But the initial steps require visibility and analysis rather than access to powered-down equipment.
The first step involves taking inventory of existing assets, verifying what equipment exists and its condition using data already in computerized maintenance management systems (CMMS). Mining existing failure data comes next. This comes from work orders, maintenance logs, and operator notes that contain patterns that reveal which assets fail repeatedly and where maintenance teams spend disproportionate time patching the same problems.
A review of the facility’s CMMS data serves as the foundation for this work. Facilities without a CMMS often have difficulty tracking asset condition, usage, and history, which can lead to missing warning signs that might prevent failures.
The criticality assessment also happens without shutdowns, as cross-functional teams can discuss what failure would mean for each major asset in conference rooms rather than on the plant floor. With inventory complete, failure patterns identified, and criticality assessed, facilities can pilot RCM on 10 to 15 critical systems, building proof of concept before expanding. This staged implementation lets operations continue while the maintenance strategy evolves.
Leverage RCM implementation to refine CMMS and documentation practices
Starting RCM provides a strategic chance to improve CMMS effectiveness without causing disruptive equipment shutdowns. At this early stage, resources and focus are already on asset condition and maintenance processes. This time is an ideal time to review, update, and standardize how data is recorded and tracked. Actively enhancing data entry procedures and verifying existing records creates a strong and accurate foundation for a maintenance program.
To further strengthen data quality, establish a formal review process that keeps work orders open until all necessary documentation is provided and verified. This structured step protects data integrity, aids in effective root cause analysis, and supports reliable trend detection. By requiring complete and validated information before closure, teams create a solid foundation for ongoing optimization of the maintenance strategy and continuous improvement across operations.
Investing in comprehensive documentation at this stage yields substantial benefits. Consistently detailed work orders, clear failure and repair codes, and standardized asset naming conventions create a thorough dataset that supports future analysis. When CMMS accurately reflects real-world conditions, it becomes an effective tool for diagnosing recurring problems, spotting trends, and making confident, informed decisions that enhance operational reliability and efficiency. In short, improving CMMS documentation now lays the foundation for smarter maintenance management in the years ahead.
Overcoming resistance and changing culture
Maintenance teams that have followed the same procedures for years often view RCM initiatives with skepticism. They take pride in their work and worry that changes might compromise the reliability they’ve worked hard to maintain.
The key to gaining buy-in involves starting with questions rather than directives. Ask maintenance personnel what frustrates them most in their daily work, what repairs they find repetitive or wasteful, and where they continually address the same problems without resolving the underlying causes.
For example, a maintenance team may patch a leaking water line every few weeks because nobody will approve funding for proper replacement. Alternatively, a technician may spend hours conducting detailed inspections of low-impact equipment, while critical systems receive less attention. Maintenance workers recognize inefficiencies, and RCM provides the framework to address them effectively.
In some cases, RCM doesn’t cut maintenance budgets but adjusts them—reallocating funds from less critical assets to better support mission-critical equipment within the same budget. This leads to improved overall reliability without extra spending, which can be a strong point when convincing budget-conscious leadership.
Involving maintenance teams in the criticality assessment and strategy development transforms their role from order-takers to strategic partners. Their insights drive better decisions about what maintenance prevents failures and what just consumes resources. A cultural shift like this takes time and requires support from facility leadership, because when executives treat maintenance as a strategic function rather than a cost center, the entire organization begins to think differently.
Timeline expectations matter because reliability improvement is not instant gratification. Instead, it follows a curve. Within the first 6 to 12 months, organizations typically experience an increase in maintenance hours and expenses as they address chronic issues and implement more effective work practices. This is the natural “improvement hump”.
Once teams push through that phase—typically by 18 to 24 months— the benefits become undeniable. Failure rates begin to fall, repeat work declines, and optimized maintenance strategies reduce overall labor and material costs. The key is setting realistic expectations, staying the course, and recognizing that the temporary rise in effort is actually the first sign of lasting improvement.
How Salas O’Brien can help
Your facility’s reliability challenges require more than textbook RCM implementation. You need partners who understand how to adapt proven frameworks to your specific and unique operational realities, balancing risk, cost, and performance in ways that work for your organization.
Salas O’Brien brings engineering depth and reliability expertise developed through real-world applications in demanding environments. Our approach doesn’t impose generic solutions. We tailor reliability-centered maintenance to your facility’s mission, operating conditions, and constraints. But we also focus on driving cultural change alongside technical implementation, because sustainable reliability improvements require both.
Whether you operate continuous processes that can’t shut down for assessments, manage aging infrastructure with limited budgets, or simply want to stop the reactive maintenance cycle, we can help you identify the right starting point and build a practical implementation path.
Ready to shift from reactive maintenance to strategic reliability? Reach out to [email protected] or contact one of our SMEs below.
For media inquiries on this article, reach out to [email protected].
Terry Tullis, PE, LEED AP
Terry Tullis has over 20 years of experience providing facility asset management and operations & maintenance (O&M) engineering services. He brings an in-depth knowledge of reliability centered maintenance (RCM), O&M optimization, and risk assessment services. Terry is the VP of Sustaining Engineering Services business line at Salas O’Brien’s Merritt Island office. Contact him at [email protected]
Aaron Thompson, CMRP, CRL
Aaron Thompson has over 17 years of experience in reliability, maintainability, and technical engineering, and is a senior professional in the aerospace and defense industry. He uses his expertise in reliability analysis, root cause analysis, failure modes and effects criticality analysis, and reliability centered maintenance to deliver optimal solutions for complex and critical systems and projects. His mission is to ensure the highest standards of reliability, safety, and performance for the people, systems, and assets that he works with, while also reducing costs, risks, and downtime. Aaron serves as a Reliability Engineer at Salas O’Brien. Contact him at [email protected].
Tim Gutman, CSP, PMP, CPE, CFPS
Tim Gutman brings 23 years of experience with Salas O’Brien delivering system safety, failure analytics, equipment reliability, and O&M optimization for aerospace, commercial, and federal clients. His current focus is on delivering criticality and tiered maintenance strategies across all NASA centers and facilities. Tim serves as a Vice President and Program Manager at Salas O’Brien. Contact him at [email protected] .