Industries Needs: 12 Measurement reliability and safety systems

12.1.4 Software reliability

As computer processors, and the software within them, are increasingly found in most measurement systems, the issue of the reliability of such components has become very important. Computer hardware behaves very much like electronic components in general, and the rules for calculating reliability given earlier can be applied. However, the factors affecting reliability in software are fundamentally different. Application of the general engineering definition of reliability to software is not appropriate because the characteristics of the error mechanisms in software and in engineering hardware are fundamentally different. Hardware systems that work correctly when first introduced can develop faults at any time in the future, and so the MTBF is a sensible measure of reliability. However, software does not change with time: if it starts off being error free, then it will remain so. Therefore, what we need to know, in advance of its use, is whether or not faults are going to be found in the software after it has been put into use. Thus, for software, a MTBF reliability figure is of little value. Instead, we must somehow express the probability that errors will not occur in it.

Quantifying software reliability

A fundamental problem in predicting that errors will not occur in software is that, however exhaustive the testing, it is impossible to say with certainty that all errors have been found and eliminated. Errors can be quantified by three parameters, D, U and T, where D is the number of errors detected by testing the software, U is the number of undetected errors and T is the total number of errors (both detected and undetected).

Hence:

U = T – D (12.9)

Good program testing can detect most errors and so make D approach T so that U tends towards zero. However, as the value of T can never be predicted with certainty, it is very difficult to predict that software is error free, whatever degree of diligence is applied during testing procedures.

Whatever approach is taken to quantifying reliability, software testing is an essential prerequisite to the quantification methods available. Whilst it is never possible to detect all the errors that might exist, the aim must always be to find and correct as many errors as possible by applying a rigorous testing procedure. Software testing is a particularly important aspect of the wider field of software engineering. However, as it is a subject of considerable complexity, the detailed procedures available are outside the scope of this book. A large number of books now cover good software engineering in general and software testing procedures in particular, and the reader requiring further information is referred to the referenced texts such as Pfleeger (1987) and Shooman (1983).

One approach to quantifying software reliability (Fenton, 1991) is to monitor the rate of error discovery during testing and then extrapolate this into an estimate of the mean-time-between-failures for the software once it has been put into use. Testing can then be extended until the predicted MTBF is greater than the projected time-horizon of usage of the software. This approach is rather unsatisfactory because it accepts that errors in the software exist and only predicts that errors will not emerge very frequently.

Confidence in the measurement system is much greater if we can say, ‘There is a high probability that there are zero errors in the software’ rather than ‘There are a finite number of errors in the software but they are unlikely to emerge within the expected lifetime of its usage.’ One way of achieving this is to estimate the value of T (total number of errors) from initial testing and then carry out further software testing until the predicted value of T is zero, in a procedure known as error seeding (Mills, 1972). In this method, the programmer responsible for producing the software deliberately puts a number of errors E into the program, such that the total number of errors in the program increases from T to T’ , where T’ = T + E. Testing is then carried out by a different programmer who will identify a number of errors given by D’ , where D’ = D + E’ and E’ is the number of deliberately inserted errors that are detected by this second programmer. Normally, the real errors detected (D) will be less than T and the seeded errors detected (E’ ) will be less than E. However, on the assumption that the ratio of seeded errors detected to the total number of seeded errors will be the same as the ratio of the real errors detected to the total number of real errors, the following expression can be written:

D/T = E’/E (12.10)

As E’ is measured, E is known and D can be calculated from the number of errors D’ detected by the second programmer according to D = D’ – E’ , the value of T can then be calculated as:

T = DE/E’ (12.11)

Example 12.5

The author of a digital signal-processing algorithm that forms a software component within a measurement system adds 12 deliberate faults to the program. The program is then tested by a second programmer, who finds 34 errors. Of these detected errors, the program author recognizes 10 of them as being seeded errors. Estimate the original number of errors present in the software (i.e. excluding the seeded errors).

Solution

The total number of errors detected (D’ ) is 34 and the program author confirms that the number of seeded errors (E’ ) within these is 10 and that the total number of seeded errors (E) was 12. Because D’ = D + E’ (see earlier), D = D’ – E’ = 24. Hence, from (12.11), T = DE/E’ = 24 12/10 = 28.8.

One flaw in expression (12.11) is the assumption that the seeded errors are representative of all the real (unseeded) errors in the software, both in proportion and character. This assumption is never entirely valid in practice because, if errors are unknown, then their characteristics are also unknown. Thus, whilst this approach may be able to give an approximate indication of the value of T, it can never predict its actual value with certainty.

An alternative to error seeding is the double-testing approach, where two independent programmers test the same program (Pfleeger, 1987). Suppose that the number of errors detected by each programmer is D₁ and D₂ respectively. Normally, the errors detected by the two programmers will be in part common and in part different. Let C be the number of common errors that both programmers find. The error-detection success of each programmer can be quantified as:

S₁ = D₁/T; S₂ = D₂/T (12.12)

It is reasonable to assume that the proportion of errors D₁ that programmer 1 finds out of the total number of errors T is the same proportion as the number of errors C that he/she finds out of the number D₂ found by programmer 2, i.e.:

D₁/T = C/D₂ = S₁, and hence D₂ = C/S₁ (12.13)

From (12.12), T = D2/S₂, and substituting in the value of D₂ obtained from (12.13), the following expression for T is obtained:

T = C/S₁S₂ (12.14)

From (12.13), S₁ = C/D₂ and from (12.12), S₂ = D₂S₁/D₁ = C/D₁. Thus, substituting for S₁ and S₂ in (12.14):

T = D₁D₂/C (12.15)

Example 12.6

A piece of software is tested independently by two programmers, and the number of errors found is 24 and 26 respectively. Of the errors found by programmer 1, 21 are the same as errors found by programmer 2.

Solution

D₁ = 24, D₂ = 26 and C = 21. Hence, applying (12.15), T = D₁D₂/C = 24 26/21 = 29.7.

Program testing should continue until the number of errors that have been found is equal to the predicted total number of errors T. In the case of example 12.6, this means continuing testing until 30 errors have been found. However, the problem with doing this is that T is only an estimated quantity and there may actually be only 28 or 29 errors in the program. Thus, to continue testing until 30 errors have been found would mean testing forever! Hence, once 28 or 29 errors have been found and continued testing for a significant time after this has detected no more errors, the testing procedure should be terminated, even though the program could still contain one or two errors. The approximate nature of the calculated value of T also means that its true value could be 31 or 32, and therefore the software may still contain errors if testing is stopped once 30 errors have been found. Thus, the fact that T is only an estimated value means the statement that a program is error free once the number of errors detected is equal to T can only be expressed in probabilistic terms.

To quantify this probability, further testing of the program is necessary (Pfleeger, 1987). The starting point for this further testing is the stage when the total number of errors T that are predicted have been found (or when the number found is slightly less than T but further testing does not seem to be finding any more errors). The next step is to seed the program with W new errors and then test it until all W seeded errors have been found. Provided that no new errors have been found during this further testing phase, the probability that the program is error free can then be expressed as:

P = W/ (W + 1) (12.16)

However, if any new error is found during this further testing phase, the error must be corrected and then the seeding and testing procedure must be repeated. Assuming that no new errors are detected, a value of W = 10 gives P = 0.91 (probability 91% that program is error free). To get to 99% error-free probability, W has to be 99.

Improving software reliability

The a priori requirement in achieving high reliability in software is to ensure that it is produced according to sound software engineering principles. Formal standards for achieving high quality in software are set out in BS 7165 (1991) and ISO 9000-3 (1991). Libraries and bookshops, especially academic ones, offer a number of texts on good software design procedures. These differ significantly in their style of approach, but all have the common aim of encouraging the production of error-free software that conforms to the design specification. It is not within the scope of this book to enter into arguments about which software design approach is best, as choice between the different software design techniques largely depends on personal preferences. However, it is essential that software contributing to a measurement system is produced according to good software engineering principles.

The second stage of reliability enhancement is the application of a rigorous testing procedure as described in the last section. This is a very time-consuming and hence expensive business, and so testing should only continue until the calculated level of reliability is the minimum needed for the requirements of the measurement system. However, if a very high level of reliability is demanded, such rigorous testing becomes extremely expensive and an alternative approach known as N-version programming is often used. N-version programming requires N different programmers to produce N different versions of the same software according to a common specification. Then, assuming that there are no errors in the specification itself, any difference in the output of one program compared with the others indicates an error in that program. Commonly, N = 3 is used, that is, three different versions of the program are produced, but N = 5 is used for measurement systems that are very critical. In this latter case, a ‘voting’ system is used, which means that up to two out of the five versions can be faulty without incorrect outputs being generated.

Unfortunately, whilst this approach reduces the chance of software errors in measurement systems, it is not foolproof because the degree of independence between programs cannot be guaranteed. Different programmers, who may be trained in the same place and use the same design techniques, may generate different programs that have the same errors. Thus, this method has the best chance of success if the programmers are trained independently and use different design techniques.

Languages such as ADA also improve the safety of software because they contain special features that are designed to detect the kind of programming errors that are commonly made. Such languages have been specifically developed with safety critical applications in mind.

Monday, December 20, 2021

12 Measurement reliability and safety systems

No comments:

Post a Comment

Labels

INSTRUMENTATION MANUFACTURERS