12.1 Reliability
The reliability of measurement
systems can be quantified as the mean time between faults occurring in the
system. In this context, a fault means the occurrence of an unexpected
condition in the system that causes the measurement output to either be
incorrect or not to exist at all. The following sections summarize the
principles of reliability theory that are relevant to measurement systems. A
fuller account of reliability theory, and particularly its application in
manufacturing systems, can be found elsewhere (Morris, 1997).
12.1.1 Principles of reliability
The reliability of a measurement
system is defined as the ability of the system to perform its required function
within specified working conditions for a stated period of time. Unfortunately,
factors such as manufacturing tolerances in an instrument and varying operating
conditions conspire to make the faultless operating life of a system impossible
to predict. Such factors are subject to random variation and chance, and
therefore reliability cannot be defined in absolute terms. The nearest one can
get to an absolute quantification of reliability are quasi-absolute terms like
the mean-timebetween-failures, which expresses the average time that the
measurement system works without failure. Otherwise, reliability has to be
expressed as a statistical parameter that defines the probability that no
faults will develop over a specified interval of time.
In quantifying reliability for a
measurement system, an immediate difficulty that arises is defining what counts
as a fault. Total loss of a measurement output is an obvious fault but a fault
that causes a finite but incorrect measurement is more difficult to identify.
The usual approach is to identify such faults by applying statistical process
control techniques (Morris, 1997).
Reliability quantification in
quasi-absolute terms
Whilst reliability is essentially
probabilistic in nature, it can be quantified in quasi-absolute terms by the
mean-time-between-failures and the mean-time-to-failure parameters. It must be
emphasized that these two quantities are usually average values calculated over
a number of identical instruments, and therefore the actual values for any
particular instrument may vary substantially from the average value.
The mean-time-between-failures (MTBF)
is a parameter that expresses the average time between faults occurring in an
instrument, calculated over a given period of time. For example, suppose that
the history of an instrument is logged over a 360 day period and the time
intervals in days between faults occurring was as follows:
11 23 27 16 19 32 6 24 13 21 26 15 14 33 29 12
17 22
The mean interval is 20 days, which
is therefore the mean-time-between-failures. An alternative way of calculating
MTBF is simply to count the number of faults occurring over a given period. In
the above example, there were 18 faults recorded over a period of 360 days and
so the MTBF can be calculated as:
MTBF =
360/18 = 20 days
Unfortunately, in the case of instruments
that have a high reliability, such in-service calculation of reliability in
terms of the number of faults occurring over a given period of time becomes
grossly inaccurate because faults occur too infrequently. In this case, MTBF
predictions provided by the instrument manufacturer can be used, since
manufacturers have the opportunity to monitor the performance of a number of
identical instruments installed in different companies. If there are a total of
F faults recorded for N identical instruments in time T, the MTBF can be
calculated as MTBF D TN/F. One drawback of this approach is that it does not
take the conditions of use, such as the operating environment, into account.
The mean-time-to-failure (MTTF) is an
alternative way of quantifying reliability that is normally used for devices
like thermocouples that are discarded when they fail. MTTF expresses the
average time before failure occurs, calculated over a number of identical
devices.
The final reliability-associated term
of importance in measurement systems is the mean-time-to-repair (MTTR). This
expresses the average time needed for repair of an instrument. MTTR can also be
interpreted as the mean-time-to-replace, since replacement of a faulty
instrument by a spare one is usually preferable in manufacturing systems to
losing production whilst an instrument is repaired.
The MTBF and MTTR parameters are
often expressed in terms of a combined quantity known as the availability
figure. This measures the proportion of the total time that an instrument is
working, i.e. the proportion of the total time that it is in an unfailed state.
The availability is defined as the ratio:
Availability = MTBF/MTBF
+ MTTR
In measurement systems, the aim must
always be to maximize the MTBF figure and minimize the MTTR figure, thereby
maximizing the availability. As far as the MTBF and MTTF figures are concerned,
good design and high-quality control standards during manufacture are the
appropriate means of maximizing these figures. Design procedures that mean that
faults are easy to repair are also an important factor in reducing the MTTR
figure.
Failure patterns
The pattern of failure in an
instrument may increase, stay the same or decrease over its life. In the case
of electronic components, the failure rate typically changes with time in the
manner shown in Figure 12.1(a). This form of characteristic is frequently known
as a bathtub curve. Early in their life, electronic components can have quite a
high rate of fault incidence up to time t1 (see Figure 12.1(a)). After this
initial working period, the fault rate decreases to a low level and remains at
this low level until time t2 when ageing effects cause the fault rate to start
to increase again. Instrument manufacturers often ‘burn in’ electronic
components for a length of time corresponding to the time t1. This means that
the components have reached the high-reliability phase of their life before
they are supplied to customers.
Mechanical components usually have
different failure characteristics as shown in Figure 12.1(b). Material fatigue
is a typical reason for the failure rate to increase over the life of a
mechanical component. In the early part of their life, when all components are
relatively new, many instruments exhibit a low incidence of faults. Then, at a
later stage, when fatigue and other ageing processes start to have a
significant effect, the rate of faults increases and continues to increase
thereafter.
Complex systems containing many
different components often exhibit a constant pattern of failure over their
lifetime. The various components within such systems each have their own
failure pattern where the failure rate is increasing or decreasing with time.
The greater the number of such components within a system, the greater is the
tendency for the failure patterns in the individual components to cancel out
and for the rate of fault-incidence to assume a constant value.
Reliability quantification in
probabilistic terms
In probabilistic terms, the
reliability R (T) of an instrument X is defined as the probability that the
instrument will not fail within a certain period of time T. The unreliability
or likelihood of failure F (T) is a corresponding term which expresses the
probability that the instrument will fail within the specified time interval. R(T)
and F(T) are related by the expression:
F (T) = 1 - R (T)
To calculate R (T) , accelerated
lifetime testingŁ is carried out for a number (N) of identical instruments.
Providing all instruments have similar conditions of use, the times of failure,
t1, t2 ...tn will be distributed about the mean time to failure tm. If the
probability density of the time-to-failure is represented by f t , the
probability that a particular instrument will fail in a time interval υt is
given by f t υt, and the probability that the instrument will fail before a
time T is given by:
F (T) =
The probability that the instrument
will fail in a time interval
F (T +
where R (T) is the probability that
the instrument will survive to time T. Dividing this expression by
F (T +
In the limit as
θf = d[F (T) ]/dt
,1/R (T) = F0 (T) R (T)
If it is assumed that the instrument
is in the constant-failure-rate phase of its life, denoted by the interval
between times t1 and t2 in Figure 12.1, then the
instantaneous failure rate at T is also the mean failure rate which can be
expressed as the reciprocal of the MTBF, i.e. mean failure rate = θf = 1/tm.
Differentiating (12.1) with respect
to time gives F0 (T) = -R0 (T) . Hence, substituting for
F0 (T) in (12.2) gives:
θf =
- R0 (T)/R (T)
This can be solved (Miller, 1990) to
give the following expression:
R (T) = exp(-θf
T)
Examination of equation (12.3) shows
that, at time t D 0, the unreliability is zero. Also, as t tends to 1, the
unreliability tends to a value of 1. This agrees with intuitive expectations
that the value of unreliability should lie between values of 0 and 1. Another
point of interest in equation (12.3) is to consider the unreliability when T =
MTBF, i.e. when T = tm. Then: F (T) = 1 - exp (-1) = 0.63, i.e. the probability
of a product failing after it has been operating for a length of time equal to
the MTBF is 63%.
Further analysis of equation (12.3)
shows that, for T/tm ≤ 0.1:
F (T)
This is a useful formula for
calculating (approximately) the reliability of a critical product which is only
used for a time that is a small proportion of its MTBF.
Example 12.1
If the mean-time-to-failure of an
instrument is 50 000 hours, calculate the probability that it will not fail
during the first 10 000 hours of operation
Solution
From (12.3), R (T) = exp (-θfT) = exp (-10 000/50 000) = 0.8187
Example 12.2
If the mean-time-to-failure of an
instrument is 80 000 hours, calculate the probability that it will not fail
during the first 8000 hours of operation.
Solution
In this case, T/tm = 80 000/8000 =
0.1 and so equation (12.4) can be applied, giving R (T) = 1 - F (T)
No comments:
Post a Comment
Tell your requirements and How this blog helped you.