The elusive PF interval
The Elusive P-F IntervalMurray Wiseman, Optimal Maintenance Decisions (OMDEC) Inc. Daming Lin, University of Toronto Naaman Gurvitz, Clockwork Solutions Inc. Marton Dundics, The DEI Group SummaryThis paper discusses the P-F Interval for CBM decision making, its areas of usefulness, and those areas where it is inadequate as a predictive maintenance strategy. An alternative more general approach based on the EXAKT system is proposed. Two categories of decisions, the first driven by failure probability, the second driven by both failure probability and economics are described. Under the second category there are three possible decision strategies: cost minimization, availability maximization, and profitability (considering both cost and availability) maximization. A numerical example illustrates all four types of decisions. References (e.g. the hyperlinks on the labels of equations 4, 6, 10, and 11) to the Appendix provide further clarification where needed.BackgroundWhen an item's functional capacity falls below its required capability we consider the asset to have "failed". Maintenance restores (reactive maintenance) or preserves (proactive maintenance) an item's functional capacity to a level exceeding that required by its users. Of the two types of maintenance, reactive or proactive, users specify the latter in certain situations. When the failure would interfere significantly with military readiness or with the goal to produce goods or services, safely, at a profit, and without violating environmental norms, the user will generally request some form of proactive maintenance.To mitigate the consequences of failure, maintenance managers lean towards a proactive maintenance policy referred to as condition based maintenance or CBM. CBM is also known (with varying nuances) by the names "on-condition maintenance", "predictive maintenance" (PdM), "condition monitoring" (CM), "prognostics & health management" (PHM), "equipment health monitoring" (EHM), or simply "preventive maintenance (PM) inspections". All of these refer to the gathering, processing, and analyzing of relevant data and observations, in order to make good and timely decisions on whether to:
When managers and engineers select a task to manage a particular failure mode, they tend to consider CBM first. CBM, if applicable, is felt to be more "conservative", less costly, and less disruptive than TBM (time based maintenance). The graph of Figure 1 represents the well known theory of CBM. It defines CBM as the detection of a potential failure in a timely manner. P is the initial point at which an evolving failure can, using the current detection technology, be observed. The actual discovery of the potential failure occurs at the subsequent CBM inspection following P. ![]() Figure 1: P-F Interval Principle for CBM DiscussionThe graph of Figure 1 illustrates constraints that the maintenance engineer should account for when designing a CBM program. The Net P-F interval must provide adequate time for the maintenance organization to react from the moment that a potential failure is detected. If it is practical to monitor at the frequency necessary for that to occur, the CBM program is said to be "technically feasible" or "applicable". In the worst case, according to the graph, if an inspection predates the potential failure by only a small amount, the subsequent inspection will still catch it in time, provided that the maintenance organization is capable of acting within the net P-F interval. If, in the long run, the repetitive proactive task succeeds, at an acceptable cost, in the avoidance or mitigation of the consequences of functional failure, the CBM program is said to be "effective" or “worthwhile”.Figure 1 assumes that:
Obstacles to the application of Figure 1Moubray, in ref. 1, suggests that if P is not known or if P-F cannot be approximated, CBM is not technically feasible. This would rule out a large number of currently active condition monitoring programs. Of the two concepts, “P” and “P-F”, it is the former that poses the greater challenge. Without “P” the P-F interval remains elusive. For this reason, before addressing the P-F interval, we must first discover when and how to declare a potential failure.In Figure 1, “P” (the point at which currently available technology can detect a failing condition) is flagged at the attainment of a specified value of some condition indicator. Finding an indicator that broadcasts the state of a targeted failure mode is a challenge in itself. In all but the simplest situations, extracting a condition indicator (feature) that faithfully tracks diminishing failure resistance, related to a targeted failure mode (aka cause or mechanism), requires considerable knowledge based on, either:
In this discussion we will assume that a physics based model is unavailable. We will focus on the second, arguably, more general case. Once a condition indicator has been proposed that reflects deterioration in a component, we still have to set the decision (potential failure) point P, which, in the absence of a model that describes the physics of the failure mode, requires a methodology of some kind. This (setting of the declaration level of the potential failure) is the problem encountered by many asset managers deluged with condition monitoring data. Unavoidable questions face any implementer of a CBM program. They are, "Where to set the potential failure?", and, "Which indicator, from among many monitored variables, should he use for this purpose?" When the physics of the situation are not well known (as is often the case), a “policy” for declaring a potential failure is far from obvious. Why does Figure 1 and the determination of P and P-F stubbornly elude our grasp? The reasons are:
Making CBM decisions in EXAKTEXAKT handles the multi-dimensionality of CBM, the probabilistic nature of failure, and the influence of working age, by providing two ways of deciding whether an item, component, or failure mode is in a potential failure state. Furthermore, if the item is not currently in a "P" state, EXAKT still provides an estimated time to failure or a remaining useful life estimate (RULE). The two decision processes can be categorized as:
![]() Figure 2: Flow diagram of EXAKT. Age data is also known as "life" data, and consists of the working age and type of event (failure or suspension) defining each life ending. CM data is "condition monitoring" data. "Cost data" defines the 'penalty' or average costs associated with failure compared to the average cost of preventing the failure. Most decisions in maintenance are driven not only by considerations of asset survival probability but also by economic factors. Where business dynamics are concerned, managers quickly appreciate the need to optimize decisions in the face of competing objectives. A policy is a procedure (model) for making the best decisions given the evidence of the moment. Achieving value from our decision policy depends on:
The seven desirables: 1. production rate, 2. quality, 3. availability, 4. mission survivability, 5. lowest cost, 6. safety, 7. environmental integrity are rarely mutually inclusive. They usually conflict. If you turn up the speed knob on the production rate, this may adversely affect quality (ultimate yield). If you run your equipment to failure, it may give an acceptable availability, but could increase costs. And so on, and so forth. The possibilities among these objectives are endless. Given this vastness of variety of outcomes of a decision process, it is no wonder that maintenance departments struggle to pinpoint the elusive "reliability" bull's eye. When faced with conflicting objectives, managers recognize the need to strike a compromise. Compromise in maintenance leads one squarely to the topic of optimization. The EXAKT optimization process helps reduce a diversity of objectives to a common denominator, so that the decision process issues the best trade-off among several goals. In the numerical example to follow we show the way in which EXAKT balances the aims of low cost and high availability in an optimized CBM policy. But first let us discuss the subject of optimization with respect to the objective of cost. In general, neither P nor F happen at fixed times nor under fixed conditions. Rather, they occur randomly according to probability distributions. EXAKT is a probabilistic approach that decides, based on current condition relative to an optimal hazard level, whether maintenance is needed. The difference of approaches will be explored in this article. In maintenance, the uncertainty of the time of failure is the reality. The increasing temperature of a bearing, a differential pressure increasing linearly across a filter, or the treads of an airplane tire wearing down in proportion to the number of landings, are all special deterministic cases of CBM, characterized by fixed P and F states. The development of cracks in a mechanical component is more stochastic. Maintainers must adopt tools and procedures that deal with the general, probabilistic, and more frequent cases. The more general, the more accurate. We intend to show how the problem can be addressed clearly only when approached in a statistical way. We will talk about managing uncertainty in order to make a strong case for how the simplified or approximate (P-F interval) approach will not lead, generally, to results that deal realistically with changes in the resistance to failure. Assumptions and models used in EXAKTThe advantage of CBM over TBM as a maintenance strategy is that it accounts for both the age of the item as well as its changing state up until the moment of decision making. We assume the item's state of health to be encoded within measurable condition indicators (which, of course, is the underlying premise of CBM).The item's monitored indicators may be presented as a vector of time dependent random variables, Z(t). Z(t) = (Z1(t), Z2(t), ... , Zm(t)) (eq. 1) Each variable in the vector contains the value of a certain measurement at that moment. We consider Z(t) a "process" because it changes with each set of CBM readings acquired at regular intervals of time t. Furthermore, it is a stochastic process. A stochastic process is sometimes called a random process. Unlike a deterministic process, instead of having only one possible 'reality' of how the process might evolve with time, in a stochastic process there is some indeterminacy in its future evolution. This uncertainty is described by probability distributions. There are many possibilities of where the process might go, but some paths are more probable than others. Z(t), then, is said to be an "m-dimensional stochastic covariate process'' observed at regular intervals of time, t. We let T, a random variable, represent the failure time of the item. A primary goal of CBM is to predict T given the current age t and the current measurements of Z(t). Achieving this goal will require us to develop a statistical model that combines the stochastic behavior of the CBM readings Z(t) with a model for the hazard rate as a function of age t and the current CBM readings Z(t). Of particular interest to us in accomplishing this objective are two theoretical models:
Proportional hazards model with time-dependent covariatesIn EXAKT the influence of condition monitoring (CM) indicators on the failure time is modeled using a Proportional Hazards Model (PHM). First proposed by Cox in 1973 (Cox D.R., Oakes D., 1984.), the Cox PHM and its variants have become one of the most widely used tools in the statistical analysis of lifetime data in biomedical sciences and in reliability. The specific model used in EXAKT is a PHM with time-dependent covariates and a Weibull baseline hazard. It is described by the hazard function (eq. 2)where β>0 is the shape parameter, η>0 is the scale parameter, and γ =( γ1,γ2,… γm,) is the coefficient vector for the condition monitoring variable (covariate) vector. The parameters β, η, and γ, will need to be estimated in the numerical solution. Markov failure time processThe physical CBM measurements of each Zi(t) will fall into classes or states that we have set up and labeled as, for example, "new", "normal", "warning", or "danger". These designations (a common practice in CBM) constitute the state space of the stochastic process Z(t). As such, the covariate vector Z(t) can be reasonably discretized to reflect, meaningfully, each of its states. Specifically, we discretize the range of each covariate Zi(t) into a finite number of intervals each of which has a representative value. This value can be taken as any value in an interval’s range. EXAKT takes the midpoint of each interval as the value representing the condition indicator’s state.Define the discretized covariate process Z(t) as Z(d)(t) = (Z1(d)(t), Z2(d)(t), ... , Zm(d)(t)). such that the value of each Zi(d)(t) is equal to the representative value of the interval into which Zi(t) falls. Denote all possible values (states) of Z(d)(t) as R1(z), R2(z), …, Rn(z) where Ri(z) = (Ri1(z), Ri2(z), …, Rim(z)) is a representative value of the covariate Zj(d)(t). Ri(z) represents the ith state of the discretized covariate process Z(d)(t). We predict likely future states of the discretized process Z(d) (t), by endowing it with some kind of probabilistic behavior. A non-homogeneous discrete Markov process has been shown (Bogdanoff & Kozin, 1985; Kopnov & Kanajev, 1994; Pulkkinen, 1991) to model the stochastic behavior of time dependent condition monitoring variables related to wear propagation. In EXAKT, it is assumed that Z(d)(t) follows a non-homogeneous Markov failure time model described by the transition probabilities Lij(x,t)=P(T>t, Z(d)(t)= Rj(z)|T>x, Z(d)(x)= Ri(z)) (eq. 3) where: x is the current working age, t (t > x) is a future working age, and i and j are the states of the covariates at x and t respectively. The above expression for the transition probability from state i to state j can be read as follows: It is the probability that the item survives until t at which time the state of Z(d)(t) is j, given that the item will have survived until x when the previous state, Z(d)(x) was i. The transition behavior can then be displayed in a Markov chain transition probability matrix, for example, that of Table 1 in the Appendix. Calculation of the transition probabilitiesEXAKT combines the Cox PHM with the Markov failure time model described above. For the following analysis it is convenient to represent Equation 3 in the following form:Lij(x,t)=P(T>t|T>x, Z(d)(x)= Ri(z))) • pij(x,t) (eq. 4) where pij(x,t)=P(Z(d)(t)= Rj(z)| T>t, Z(d)(x)= Ri(z)) (eq. 5) is the conditional transition probability of the process Z(d)(t). For a short interval of time, values of transition probabilities can be approximated as: Lij(x,x+Δx)=(1-h(x,Ri(z))Δx) • pij(x,x+ Δx) (eq. 6) Equation 6 means that we can, in small steps, calculate the future probabilities for the state of the covariate process Z(d)(t). Using the hazard calculated (from Equation 2) at each successive state we determine the transition probabilities for the next small increment in time, from which we again calculate the hazard, and so on. CBM decisions based on probabilityThe “conditional reliability” is the probability of survival to t given that
The conditional reliability function can be expressed as: (eq. 7)Equation 7 points out that the conditional reliability is equal to the sum of the conditional transition probabilities from state i to all possible states. Once the conditional reliability function is calculated we can obtain the conditional density from its derivative. We can also find the conditional expectation of T - t, termed the remaining useful life (RUL), as (eq. 8)In addition, the conditional probability of failure in a short period of time Δt can be found as P(t For a maintenance engineer, predictive information based on current CM data, such as RUL and probability of failure in a future time period, can be valuable for risk assessment and planning maintenance. CBM decisions based on economics and probabilitySuccessful businesses confronted with a variety of risk factors, optimize their policies and resources in order to achieve their ultimate objectives. An economic decision model is a rule for preventive renewal of an asset that minimizes the average per unit cost associated with maintenance (proactive and reactive) over a long time horizon. (This cost is the key performance indicator most directly related to shareholder value.) Such a rule for CBM may be reasonably expressed as a control-limit policy: perform preventive maintenance at Td, if Td < T; or perform reactive maintenance at T if Td ≥ T, whereTd=inf{t≥0:Kh(t,Z(d)(t))≥d} (eq. 10) K is the cost penalty associated with functional failure, h(t,Z(d)(t) is the hazard, and d (> 0) is the risk control limit for performing preventive maintenance. Here risk is defined as the functional failure cost penalty K times the hazard rate. The long-run expected cost of maintenance (preventive and reactive) per unit of working age will be (eq. 11)where Cp is the cost of preventive maintenance, Cf = Cp+K is the cost of reactive maintenance, Q(d)=P(Td≥T) is the probability of failure prior to a preventive action, W(d)=E(min{Td,T}) is the expected time of maintenance (preventive or reactive). Let d* be the value of d that minimizes the right-hand side of Equation 11. It corresponds to T* = Td*. Makis and Jardine in ref. 3 have shown that for a non-decreasing hazard function h(t,Z(d)(t), rule T* is the best possible replacement policy (ref. 4). Equation 10 can be re-written for the optimal control limit policy as: T*=Td*=inf{t≥0:Kh(t,Z(d)(t))≥d*} (eq. 12) For the PHM model with Weibull baseline distribution, it can be interpreted as (ref. 2)) (eq. 13)where (eq. 14)Ref. 2 mentions the numerical solution to Equation 13, which is described in detail in (Ref. 7) and (Ref. 8). The function g(t)=δ*-(β-1)ln(t) (eq. 15) is the “warning level” function for the condition of the item reflected by a weighted sum of current values of the significant CM variables (covariates). A plot of function versus working age can be viewed as an economical decision chart which shows whether the data suggests that the item has to be replaced. In the decision chart, we approximate the value of by .An example of a decision chart with several inspection points can be found in Figure 3. Detailed case studies based on the model discussed in this section can be found in ref. 5 and (ref. 6). ![]() Figure 3: Sample economical decision chart (for )The above development from (ref. 2) speaks to cost as the optimizing objective. Analogous developments have been made in EXAKT considering availability and profitability as optimizing objectives. For these two objectives, we only need to change the objective function in Equation 11 accordingly. Specifically, for the availability objective, Equation 11 will be replaced by the availability function, defined as the ratio of uptime to the uptime plus the downtime. (eq. 16)where W(d) is the expected uptime, tp is the downtime as a result of planned maintenance and tf is the downtime as a result of maintenance forced by a functional failure. The objective is also changed to maximize , i.e., d* will be the value of d that minimizes the right-hand side of Equation 16.For the profitability objective, Equation 11 will be replaced by the global cost function (eq. 17)where ap is the cost per hour of planned down time and af is the cost per hour of unplanned down time. The objective is to minimize Equation 17, i.e., to find d*, which is the value of d that minimizes the right-hand side of Equation 17. Numerical exampleA CBM program on a fleet of Nitrogen compressors monitors the failure mode “second stage piston ring failure”. Real time data from sensors and process computers are collected in a PI historian. Work orders record the as-found state of the rings at maintenance. In the following example, four decisions are generated by the EXAKT software, depending on which of: probability alone, cost, availability, or profitability (cost and availability) has been set as the optimizing objective. The data for this example is available by contacting the author. Background information on this example and the failure mode can be found in (ref. 9).CBM Report Decision based on probability RUL = 106.99616, StdDev = 67.173893 ![]() Decisions based on economics and probability Cost minimization
Availability maximization
Profitability maximization
ConclusionsThe fourth option in the numerical example, optimizing for both low cost and high availability, resolves the difficult problem of deciding upon a CBM (data interpretation) policy or decision model in the light of actual maintenance and business factors. The feature encourages maintenance managers and engineers to elicit good cost and availability information because now they can use it effectively in their decision process. Maintenance and reliability engineers will apply the proposed method based on the EXAKT software to situations where abundant condition monitoring data coincides with the experience of failures and potential failures as recorded in the CMMS.In this article, we provided an alternative to the difficult, subjective, and often impossible task of choosing a P and P-F interval for use as a CBM decision making procedure. We described a general methodology whereas Reference 1 describes only two special cases (totally random and totally age dependent) for the application of the P-F interval. In the experience of the authors a mix of random and time based behavior characterize the majority of failure modes where CBM will be of use. Finally, it is necessary to point out that maintenance engineers, frequently, encounter a practical problem when constructing CBM decision models (or when performing any type of reliability analysis). Despite the long time use of elaborate maintenance information relational database systems, reliability analysts find that they lack life data required for study and modeling. To resolve this issue, OMDEC developed a novel process for work order completion that links the work order to the knowledge repository. It has built such a knowledge management system into the new version of the EXAKT product. AcknowledgementWe are grateful to Dr. Dragan Banjevic of the Center for Maintenance Optimization and Reliability Excellence for his assistance and suggestions.References1. John Moubray, RCM II 2nd ed. Butterworth-Heinnemann, 2001 pp 164-5 "How to determine the P-F Interval ... A rational approach".2. A.K.S. Jardine, D. Banjevic, N. Montgomery, A. Pak., Repairable system reliability: recent developments in CBM optimization, CORS / Optimization Days 2006 Joint Conference, Montreal, May 8-10, 2006 3. Makis V., Jardine A.K.S., Optimal replacement in the proportional hazards model, INFOR, Vol. 30, pp. 172–183, 1991. 4. Aven T., Bergman B., Optimal replacement times - a general set-up, Journal of Applied Probability, Vol. 23, pp. 432–442, 1986. 5. Jardine A.K.S., Banjevic D., Wiseman M., Buck S., Joseph T., Optimizing a mine haul truck wheel motors’ condition monitoring program, JQME, Vol. 7, pp. 286–301, 2001. 6. Lin D., Wiseman M., Banjevic D., Jardine A.K.S., An approach to signal processing and condition-based maintenance for gearboxes subject to tooth failure, Mechanical Systems and Signal Processing, Vol. 18, pp. 993–1007, 2004. 7. Banjevic D., Jardine A.K.S., Makis V., Ennis M., A control-limit policy and software for condition-based maintenance optimization, INFOR, Vol. 39, pp. 32–50, 2001. 8. Banjevic D., Jardine A.K.S., Calculation of reliability function and remaining useful life for a Markov failure time process, IMA Journal of Management Mathematics, [Online] doi:10.1093/imaman/dpi029, 2005. 9. Optimization of Bellis & Morcom 3rd-stage piston ring CBM model, OMDEC case study, Optimization of Bellis & Morcom 3rd-stage piston ring CBM model, http://www.omdec.com/articles/p_recipN2Compressors.html
|
LoginQuick Edit a Wiki Page |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||