12.1.4 Software reliability
As computer processors, and the
software within them, are increasingly found in most measurement systems, the
issue of the reliability of such components has become very important. Computer
hardware behaves very much like electronic components in general, and the rules
for calculating reliability given earlier can be applied. However, the factors
affecting reliability in software are fundamentally different. Application of
the general engineering definition of reliability to software is not
appropriate because the characteristics of the error mechanisms in software and
in engineering hardware are fundamentally different. Hardware systems that work
correctly when first introduced can develop faults at any time in the future,
and so the MTBF is a sensible measure of reliability. However, software does
not change with time: if it starts off being error free, then it will remain
so. Therefore, what we need to know, in advance of its use, is whether or not
faults are going to be found in the software after it has been put into use.
Thus, for software, a MTBF reliability figure is of little value. Instead, we
must somehow express the probability that errors will not occur in it.
Quantifying software reliability
A fundamental problem in predicting
that errors will not occur in software is that, however exhaustive the testing,
it is impossible to say with certainty that all errors have been found and
eliminated. Errors can be quantified by three parameters, D, U and T, where D
is the number of errors detected by testing the software, U is the number of
undetected errors and T is the total number of errors (both detected and
undetected).
Hence:
U = T – D (12.9)
Good program testing can detect most errors
and so make D approach T so that U tends towards zero. However, as the value of
T can never be predicted with certainty, it is very difficult to predict that
software is error free, whatever degree of diligence is applied during testing
procedures.
Whatever approach is taken to
quantifying reliability, software testing is an essential prerequisite to the
quantification methods available. Whilst it is never possible to detect all the
errors that might exist, the aim must always be to find and correct as many
errors as possible by applying a rigorous testing procedure. Software testing
is a particularly important aspect of the wider field of software engineering.
However, as it is a subject of considerable complexity, the detailed procedures
available are outside the scope of this book. A large number of books now cover
good software engineering in general and software testing procedures in
particular, and the reader requiring further information is referred to the
referenced texts such as Pfleeger (1987) and Shooman (1983).
One approach to quantifying software
reliability (Fenton, 1991) is to monitor the rate of error discovery during
testing and then extrapolate this into an estimate of the
mean-time-between-failures for the software once it has been put into use.
Testing can then be extended until the predicted MTBF is greater than the
projected time-horizon of usage of the software. This approach is rather
unsatisfactory because it accepts that errors in the software exist and only
predicts that errors will not emerge very frequently.
Confidence in the measurement system
is much greater if we can say, ‘There is a high probability that there are zero
errors in the software’ rather than ‘There are a finite number of errors in the
software but they are unlikely to emerge within the expected lifetime of its
usage.’ One way of achieving this is to estimate the value of T (total number
of errors) from initial testing and then carry out further software testing
until the predicted value of T is zero, in a procedure known as error seeding
(Mills, 1972). In this method, the programmer responsible for producing the
software deliberately puts a number of errors E into the program, such that the
total number of errors in the program increases from T to T’ , where T’ = T +
E. Testing is then carried out by a different programmer who will identify a
number of errors given by D’ , where D’ = D + E’ and E’ is the number of
deliberately inserted errors that are detected by this second programmer.
Normally, the real errors detected (D) will be less than T and the seeded
errors detected (E’ ) will be less than E. However, on the assumption that the
ratio of seeded errors detected to the total number of seeded errors will be
the same as the ratio of the real errors detected to the total number of real
errors, the following expression can be written:
D/T = E’/E (12.10)
As E’ is measured, E is known and D
can be calculated from the number of errors D’ detected by the second
programmer according to D = D’ – E’ , the value of T can then be calculated as:
T =
DE/E’
(12.11)
Example 12.5
The author of a digital
signal-processing algorithm that forms a software component within a
measurement system adds 12 deliberate faults to the program. The program is
then tested by a second programmer, who finds 34 errors. Of these detected
errors, the program author recognizes 10 of them as being seeded errors.
Estimate the original number of errors present in the software (i.e. excluding
the seeded errors).
Solution
The total number of errors detected
(D’ ) is 34 and the program author confirms that the number of seeded errors (E’
) within these is 10 and that the total number of seeded errors (E) was 12.
Because D’ = D + E’ (see earlier), D = D’ – E’ = 24. Hence, from (12.11), T =
DE/E’ = 24
One flaw in expression (12.11) is the
assumption that the seeded errors are representative of all the real (unseeded)
errors in the software, both in proportion and character. This assumption is
never entirely valid in practice because, if errors are unknown, then their
characteristics are also unknown. Thus, whilst this approach may be able to
give an approximate indication of the value of T, it can never predict its
actual value with certainty.
An alternative to error seeding is
the double-testing approach, where two independent programmers test the same
program (Pfleeger, 1987). Suppose that the number of errors detected by each
programmer is D1 and D2 respectively. Normally, the
errors detected by the two programmers will be in part common and in part
different. Let C be the number of common errors that both programmers find. The
error-detection success of each programmer can be quantified as:
S1 =
D1/T; S2 = D2/T (12.12)
It is reasonable to assume that the
proportion of errors D1 that programmer 1 finds out of the total
number of errors T is the same proportion as the number of errors C that he/she
finds out of the number D2 found by programmer 2, i.e.:
D1/T
= C/D2 = S1, and hence D2 = C/S1 (12.13)
From (12.12), T = D2/S2,
and substituting in the value of D2 obtained from (12.13), the
following expression for T is obtained:
T = C/S1S2 (12.14)
From (12.13), S1 = C/D2
and from (12.12), S2 = D2S1/D1 =
C/D1. Thus, substituting for S1 and S2 in
(12.14):
T = D1D2/C (12.15)
Example 12.6
A piece of software is tested
independently by two programmers, and the number of errors found is 24 and 26
respectively. Of the errors found by programmer 1, 21 are the same as errors
found by programmer 2.
Solution
D1 = 24, D2 =
26 and C = 21. Hence, applying (12.15), T = D1D2/C = 24
Program testing should continue until
the number of errors that have been found is equal to the predicted total
number of errors T. In the case of example 12.6, this means continuing testing
until 30 errors have been found. However, the problem with doing this is that T
is only an estimated quantity and there may actually be only 28 or 29 errors in
the program. Thus, to continue testing until 30 errors have been found would
mean testing forever! Hence, once 28 or 29 errors have been found and continued
testing for a significant time after this has detected no more errors, the
testing procedure should be terminated, even though the program could still
contain one or two errors. The approximate nature of the calculated value of T
also means that its true value could be 31 or 32, and therefore the software
may still contain errors if testing is stopped once 30 errors have been found.
Thus, the fact that T is only an estimated value means the statement that a
program is error free once the number of errors detected is equal to T can only
be expressed in probabilistic terms.
To quantify this probability, further
testing of the program is necessary (Pfleeger, 1987). The starting point for
this further testing is the stage when the total number of errors T that are predicted
have been found (or when the number found is slightly less than T but further
testing does not seem to be finding any more errors). The next step is to seed
the program with W new errors and then test it until all W seeded errors have
been found. Provided that no new errors have been found during this further
testing phase, the probability that the program is error free can then be
expressed as:
P = W/ (W + 1)
(12.16)
However, if any new error is found
during this further testing phase, the error must be corrected and then the
seeding and testing procedure must be repeated. Assuming that no new errors are
detected, a value of W = 10 gives P = 0.91 (probability 91% that program is
error free). To get to 99% error-free probability, W has to be 99.
Improving software reliability
The a priori requirement in achieving
high reliability in software is to ensure that it is produced according to
sound software engineering principles. Formal standards for achieving high
quality in software are set out in BS 7165 (1991) and ISO 9000-3 (1991).
Libraries and bookshops, especially academic ones, offer a number of texts on
good software design procedures. These differ significantly in their style of
approach, but all have the common aim of encouraging the production of
error-free software that conforms to the design specification. It is not within
the scope of this book to enter into arguments about which software design
approach is best, as choice between the different software design techniques
largely depends on personal preferences. However, it is essential that software
contributing to a measurement system is produced according to good software
engineering principles.
The second stage of reliability
enhancement is the application of a rigorous testing procedure as described in
the last section. This is a very time-consuming and hence expensive business,
and so testing should only continue until the calculated level of reliability is
the minimum needed for the requirements of the measurement system. However, if
a very high level of reliability is demanded, such rigorous testing becomes
extremely expensive and an alternative approach known as N-version programming
is often used. N-version programming requires N different programmers to
produce N different versions of the same software according to a common
specification. Then, assuming that there are no errors in the specification
itself, any difference in the output of one program compared with the others
indicates an error in that program. Commonly, N = 3 is used, that is, three
different versions of the program are produced, but N = 5 is used for
measurement systems that are very critical. In this latter case, a ‘voting’
system is used, which means that up to two out of the five versions can be
faulty without incorrect outputs being generated.
Unfortunately, whilst this approach
reduces the chance of software errors in measurement systems, it is not
foolproof because the degree of independence between programs cannot be
guaranteed. Different programmers, who may be trained in the same place and use
the same design techniques, may generate different programs that have the same
errors. Thus, this method has the best chance of success if the programmers are
trained independently and use different design techniques.
Languages such as ADA also improve
the safety of software because they contain special features that are designed
to detect the kind of programming errors that are commonly made. Such languages
have been specifically developed with safety critical applications in mind.
No comments:
Post a Comment
Tell your requirements and How this blog helped you.