Discussion on failure codesDiscussion of failure codes in the NavyShould they be categorized according to the equipment registration number (ERN), to Object type, or one-to-one to the FMEA record? Aug 26, 2008 Executive SummaryFailure codes represent likely failure behaviors. They are based on our fore-knowledge of an asset’s failure causes. Knowledge about an asset’s failure behavior is stored most efficiently in a FMEA / FMECA / RCM data structure as a set of unique records. A given set of knowledge records will apply to an ERN operating at one or more specified functional locations. A failure code should relate, therefore, to an ERN and to a functional location (entailing an operating context). A failure code, by itself, is insufficient for analysis of its accumulated instances. A second work order attribute, called an “Event type” is also required. An event type describes the way a failure mode ended its life (usually by one of potential failure, functional failure, or suspension). The combination of Event type and Failure code provides the reliability engineer or logistician with data needed to assess the effectiveness of the current supportability plan so that he may propose improvements and measure the success of those changes.![]() DiscussionThis discussion strikes to the heart of the “role of information” in maintenance. Maintenance, as we know, is a core element of supportability. Hence the decisions we make regarding maintenance information collection, will impact the organization for decades to come. Needless to say, we should think upon them carefully prior to carving them, permanently, into the our maintenance procedures.In deciding upon the role of the failure code, we ask ourselves some basic questions:
Q1 Why do we collect information, generally? Information related to experience in maintenance originates in the two sources listed (in the red box) below. We use information to perform one of the six general activities listed in the blue box. ![]() We may apply the “acid test”: If a given type of information cannot clearly contribute to one of the six (blue box) tasks we should probably regard with suspicion, any proposal for its collection and inclusion in the CMMS database. Q2 Why do we collect failure codes? It follows from Q1 that we should collect failure codes when we know, with absolute certainty, that their compilation and analysis, will enable us to perform one or more of the six activities in the blue box. Interestingly, the phrase “failure code” is not present in any Logistic Support Analysis or Integrated Logistic Support standards. Neither does it exist in the military RCM standards of the DoD or MOD. Where did it come from? Many maintenance professionals who seek information with which to improve equipment effectiveness seize upon the opportunity to record failures in the form of “failure codes”. Such short descriptive acronyms or phrases appear as a “good first step” towards acquiring useful knowledge about failure behavior. Ideally, failure codes should present themselves in the form of configurable, context sensitive drop-down lists for convenient, yet accurate, failure classification. The maintainer selects a failure code while completing the work order form.
Pick lists of maintenance failure codes are often difficult to choose from and prone to error. The selection items are often too general or do not adequately fit a given situation. Or, alternatively, long lists of precise codes suffer from “choice overload” resulting in the overuse of the default “other”. Without doubt, effective and accurate lists are the ultimate objective of reliability-centered knowledge systems. But deciding what selection choices to place on such pick lists is no trivial matter. We require some process, then, that will facilitate the day-to-day recording of useful reliability information in the short term, but additionally, must eventually evolve to the provision of accurate, robust pick lists. Reliability and supportability professionals must address the problem of failure code development and suggest an approach that is reasonable, simple, robust and progressive. Moreover, the approach, must unify failure mode records in the FMEA / FMECA / RCM / Logistic Support Analysis knowledge base with the failure codes in the work order database. No small task. The two data sources in the red box result from the execution of a maintenance task, either a proactive or a reactive one. Let us assume we (in the role of reliability and supportability (R&S) managers) had interviewed the maintainer and operator immediately following the task. What information would we request of them? Possibly:
Q3 What is a failure code? Given such lofty requirements, how then shall we define the “failure code” in the Navy? This question requires that we think about what we want to do with our CMMS compilation of failure code instances. We want to count them – in several ways. For example, the number of times an instance of a particular failure code occurs, divided by the total operating age of an equipment, approximates the failure rate. Knowing the failure rate for a failure code can help in responding to some of the blue box requirements. Other ways of counting instances of failure codes include Pareto analysis, Weibull analysis, scatter chart analysis, Monte Carlo simulation, proportional hazard modeling, and many others. These techniques, called, broadly, “reliability analysis methods” provide knowledge about failure behavior. The reliability engineer acquires such knowledge prior to introducing performance improving changes in maintenance policies and procedures. Now that we know what we want to do with the compiled instances of our failure codes, we can specify the criteria for a failure code. Examining the seven RCM questions, we conclude that the failure code should record instances of the answers to the third question – “the cause or mode of failure”. By (RCM) definition, the failure mode is “the event that caused the failure”. Q4 Will failure codes attain our reliability and supportability goals? When selecting a failure cause we encounter a recurring non-trivial, two-dimensional problem: 1) To what depth of causality, and, 2) to what degree of detail (i.e. how many failure modes) shall we consider? When setting up our failure codes for inclusion in CMMS drop down lists, we will have to resolve this issue for each and every likely failure code (mode). Otherwise, our compiled failure code instances will scatter in either or both dimensions with each occurrence. This indeed, is the well known “failure code” problem that has plagued military and industrial maintenance organizations for six decades. Reliability analysis is prerequisite to reliability / supportability improvement. Yet few maintenance organizations (military or industrial) derive good systematic decision making capability from an analysis of the compiled instances of traditional failure codes. Many maintenance information related projects propose having separate lists of failure codes for individual categories of equipment. Linking the failure codes to records in the FMEA / FMECA / RCM knowledge is a better idea. It will, at least, solve “half” the problem. New technology [1] will access FMECA records quickly and conveniently and link a work order to the appropriate knowledge record. If the knowledge record does not yet exist, it should be generated on-the-fly, thereby, growing and refining the knowledge base. Naturally a knowledge quality assurance process will need to be implemented by the organization's reliability engineer. Assume that we will have set up, successfully, a practical, modern, process for accessing, referencing (on the work order), quality assuring, and growing our reliability knowledge base. We must still address the other “half” of the problem if we are to meet the criteria for information utility (as specified in the blue box). Reliability analysis requires that we record, not only the instance of the FMEA / RCM knowledge record, but also the nature of its occurrence. Did this failure mode terminate as an actual loss of function? That is, did it have consequences? Or, did this failure mode terminate with an impending (potential) loss of function (relatively minor or no consequences)? Or, thirdly, was this failure mode’s life ended by a discretionary decision to renew the component or equipment preventively where there was no failure, not even an impending failure? Therefore we require a second code or attribute to our work order. This second code will be, usually, one of:
Ironically, the major impediment to implementing these “two” codes is not an information technology (IT) one. It is a relatively simple matter to include two additional attributes to a MAF within the powerful SAP system. The greater challenge is “cultural”. The Navy must teach the maintainers, their supervisors, their engineers and officers:
ConclusionsWe collect failure codes for the purpose of analysis. Analysis, in turn, provides knowledge of equipment failure behavior patterns [2]. A failure code applied to a work order marks an instance (i.e. an occurrence) of a particular failure behavior. The failure code itself represents a piece of knowledge about the likely failure behavior of an asset. Knowledge of asset behavior resides most efficiently in a FMEA / FMECA / RCM knowledge base. The failure code relating to a FMEA knowledge record is a necessary but not sufficient condition for subsequent analysis. In addition we require a second (work order) attribute indicating the manner in which the failure mode “life” ended. This second attribute distinguishes between “ending by failure” and ending by preventive replacement (aka “suspension”). The combined information from instances of both attributes will provide the required input to RAM and other analysis methodologies.This discussion of failure codes is illustrated in a powerpoint presentation. GlossaryERN: The number assigned to an equipment or system type is referred to as the Equipment Registration Number (ERN). An ERN identifies an equipment throughout its life cycle, and forms the basis for numbering of related technical documentation (and FMECA records). For example the ERN “E-27-334-A00” specifies an Ingersoll Rand Type 3NVMK-50 Fire Pump. The ERN has three levels of specificity 1) Primary group identifier (e.g. “27”), 2) Specific identifier (e.g. “334”), and 3) Sub-system (or configuration) identifier (e.g. “A00”).Object type: Is an arbitrary categorization of related equipment, for example: “Tactical Software”, “Damage Control Equipment”, “Filtration Equipment”, etc. Functional location: The Functional Location denotes the place where work can be done. Functional Location represents the next level of detail below the plant. Notifications and work orders include reference to Functional Location. It can represent the building, floor, room, or any point in a process or operation. [1] Since 2006, Scandinavian oil platform operators closed the loop between initial RCM-FMECA (failure modes, effects, and criticality analysis) and operational FMECA. The Failure Code table in the CMMS connects directly from the FMECA database. When the appropriate Failure Code cannot be found on the context sensitive list a new code and record are added “on-the-fly” (subject to an appropriate quality control procedure) to the FMECA / RCM knowledge base. There is no catch-all code called “other”. [2] With respect to other external and internal factors. Such factors include “working” age, PM frequency, EHM parameters and decision models, operational tradeoffs, and so on.)
|
LoginQuick Edit a Wiki Page |