Industries Needs: 11 Display, recording and presentation of measurement data

11.3 Presentation of data

The two formats available for presenting data on paper are tabular and graphical ones and the relative merits of these are compared below. In some circumstances, it is clearly best to use only one or other of these two alternatives alone. However, in many data collection exercises, part of the measurements and calculations are expressed in tabular form and part graphically, so making best use of the merits of each technique. Very similar arguments apply to the relative merits of graphical and tabular presentations if a computer screen is used for the presentation instead of paper.

11.3.1 Tabular data presentation

A tabular presentation allows data values to be recorded in a precise way that exactly maintains the accuracy to which the data values were measured. In other words, the data values are written down exactly as measured. Besides recording the raw data values as measured, tables often also contain further values calculated from the raw data. An example of a tabular data presentation is given in Table 11.1. This records the results of an experiment to determine the strain induced in a bar of material that is subjected to a range of stresses. Data were obtained by applying a sequence of forces to the end of the bar and using an extensometer to measure the change in length. Values of the stress and strain in the bar are calculated from these measurements and are also included in the table. The final row, which is of crucial importance in any tabular presentation, is the estimate of possible error in each calculated result.

A table of measurements and calculations should conform to several rules as illustrated in Table 11.1:

(i) The table should have a title that explains what data are being presented within the table.

(ii) Each column of figures in the table should refer to the measurements or calculations associated with one quantity only.

(iii) Each column of figures should be headed by a title that identifies the data values contained in the column.

(iv) The units in which quantities in each column are measured should be stated at the top of the column.

(v) All headings and columns should be separated by bold horizontal (and sometimes vertical) lines.

(vi) The errors associated with each data value quoted in the table should be given. The form shown in Table 11.1 is a suitable way to do this when the error level is the same for all data values in a particular column. However, if error levels vary, then it is preferable to write the error boundaries alongside each entry in the table.

11.3.2 Graphical presentation of data

Presentation of data in graphical form involves some compromise in the accuracy to which the data are recorded, as the exact values of measurements are lost. However, graphical presentation has important advantages over tabular presentation.

(i) Graphs provide a pictorial representation of results that is more readily comprehended than a set of tabular results.

(ii) Graphs are particularly useful for expressing the quantitative significance of results and showing whether a linear relationship exists between two variables. Figure 11.12 shows a graph drawn from the stress and strain values given in the Table 11.1. Construction of the graph involves first of all marking the points corresponding to the stress and strain values. The next step is to draw some lines through these data points that best represents the relationship between the two variables. This line will normally be either a straight one or a smooth curve. The data points will not usually lie exactly on this line but instead will lie on either side of it. The magnitude of the excursions of the data points from the line drawn will depend on the magnitude of the random measurement errors associated with the data.

(iii) Graphs can sometimes show up a data point that is clearly outside the straight line or curve that seems to fit the rest of the data points. Such a data point is probably due either to a human mistake in reading an instrument or else to a momentary malfunction in the measuring instrument itself. If the graph shows such a data point where a human mistake or instrument malfunction is suspected, the proper course of action is to repeat that particular measurement and then discard the original data point if the mistake or malfunction is confirmed.

Like tables, the proper representation of data in graphical form has to conform to certain rules:

(i) The graph should have a title or caption that explains what data are being presented in the graph.

(ii) Both axes of the graph should be labelled to express clearly what variable is associated with each axis and to define the units in which the variables are expressed.

(iii) The number of points marked along each axis should be kept reasonably small – about five divisions is often a suitable number.

(iv) No attempt should be made to draw the graph outside the boundaries corresponding to the maximum and minimum data values measured, i.e. in Figure 11.12, the graph stops at a point corresponding to the highest measured stress value of 108.5.

Fitting curves to data points on a graph

The procedure of drawing a straight line or smooth curve as appropriate that passes close to all data points on a graph, rather than joining the data points by a jagged line that passes through each data point, is justified on account of the random errors that are known to affect measurements. Any line between the data points is mathematically acceptable as a graphical representation of the data if the maximum deviation of any data point from the line is within the boundaries of the identified level of possible measurement errors. However, within the range of possible lines that could be drawn, only one will be the optimum one. This optimum line is where the sum of negative errors in data points on one side of the line is balanced by the sum of positive errors in data points on the other side of the line. The nature of the data points is often such that a perfectly acceptable approximation to the optimum can be obtained by drawing a line through the data points by eye. In other cases, however, it is necessary to fit a line mathematically, using regression techniques.

Regression techniques

Regression techniques consist of finding a mathematical relationship between measurements of two variables y and x, such that the value of variable y can be predicted from a measurement of the other variable x. However, regression techniques should not be regarded as a magic formula that can fit a good relationship to measurement data in all circumstances, as the characteristics of the data must satisfy certain conditions. In determining the suitability of measurement data for the application of regression techniques, it is recommended practice to draw an approximate graph of the measured data points, as this is often the best means of detecting aspects of the data that make it unsuitable for regression analysis. Drawing a graph of the data will indicate, for example, whether there are any data points that appear to be erroneous. This may indicate that human mistakes or instrument malfunctions have affected the erroneous data points, and it is assumed that any such data points will be checked for correctness.

Regression techniques cannot be successfully applied if the deviation of any particular data point from the line to be fitted is greater than the maximum possible error that is calculated for the measured variable (i.e. the predicted sum of all systematic and random errors). The nature of some measurement data sets is such that this criterion cannot be satisfied, and any attempt to apply regression techniques is doomed to failure. In that event, the only valid course of action is to express the measurements in tabular form. This can then be used as a x –y look-up table, from which values of the variable y corresponding to particular values of x can be read off. In many cases, this problem of large errors in some data points only becomes apparent during the process of attempting to fit a relationship by regression.

A further check that must be made before attempting to fit a line or curve to measurements of two variables x and y is to examine the data and look for any evidence that both variables are subject to random errors. It is a clear condition for the validity of regression techniques that only one of the measured variables is subject to random errors, with no error in the other variable. If random errors do exist in both measured variables, regression techniques cannot be applied and recourse must be made instead to correlation analysis (covered later in this chapter). A simple example of a situation where both variables in a measurement data set are subject to random errors are measurements of human height and weight, and no attempt should be made to fit a relationship between them by regression.

Having determined that the technique is valid, the regression procedure is simplest if a straight-line relationship exists between the variables, which allows a relationship of the form y = a + bx to be estimated by linear least squares regression. Unfortunately, in many cases, a straight-line relationship between the points does not exist, which is readily shown by plotting the raw data points on a graph. However, knowledge of physical laws governing the data can often suggest a suitable alternative form of relationship between the two sets of variable measurements, such as a quadratic relationship or a higher order polynomial relationship. Also, in some cases, the measured variables can be transformed into a form where a linear relationship exists. For example, suppose that two variables y and x are related according to y = ax^c. A linear relationship from this can be derived, using a logarithmic transformation, as log(y) = log(a) + c log(x) .

Thus, if a graph is constructed of log(y) plotted against log(x) , the parameters of a straight-line relationship can be estimated by linear least squares regression.

All quadratic and higher order relationships relating one variable y to another variable x can be represented by a power series of the form:

y = a₀ + a₁x + a₂x² +…+ a_px^p

Estimation of the parameters a₀ ...a_p is very difficult if p has a large value. Fortunately, a relationship where p only has a small value can be fitted to most data sets. Quadratic least squares regression is used to estimate parameters where p has a value of two, and for larger values of p, polynomial least squares regression is used for parameter estimation.

Where the appropriate form of relationship between variables in measurement data sets is not obvious either from visual inspection or from consideration of physical laws, a method that is effectively a trial and error one has to be applied. This consists of estimating the parameters of successively higher order relationships between y and x until a curve is found that fits the data sufficiently closely. What level of closeness is acceptable is considered in the later section on confidence tests.

Linear least squares regression

If a linear relationship between y and x exists for a set of n measurements y₁ ...y_n, x₁ ...x_n, then this relationship can be expressed as y = a + bx, where the coefficients a and b are constants. The purpose of least squares regression is to select the optimum values for a and b such that the line gives the best fit to the measurement data.

The deviation of each point (x_i, y_i) from the line can be expressed as d_i, where d_i = y_i (a + bx_i).

The best-fit line is obtained when the sum of the squared deviations, S, is a minimum, i.e. when

Example 11.1

In an experiment to determine the characteristics of a displacement sensor with a voltage output, the following output voltage values were recorded when a set of standard displacements was measured:

Fit a straight line to this set of data using least squares regression and estimate the output voltage when a displacement of 4.5 cm is measured.

Solution

Let y represent the output voltage and x represent the displacement. Then a suitable straight line is given by y = a + bx. We can now proceed to calculate estimates for the coefficients a and b using equations (11.8) and (11.9) above. The first step is to calculate the mean values of x and y. These are found to be x_m = 5.5 and y_m = 11.47. Next, we need to tabulate x_iy_i and x² _i for each pair of data values:

Hence, for x = 4.5, y = 0.1033 + (2.067 × 4.5) = 9.40 volts. Note that in this solution, we have only specified the answer to an accuracy of three figures, which is the same accuracy as the measurements. Any greater number of figures in the answer would be meaningless.

Least squares regression is often appropriate for situations where a straight-line relationship is not immediately obvious, for example where y α x² or y α exp(x) .

Example 11.2

From theoretical considerations, it is known that the voltage (V) across a charged capacitor decays with time (t) according to the relationship: V = K exp(t/ ). Estimate values for K and if the following values of V and t are measured.

The minimum can be found by setting the partial derivatives ∂S/∂a, ∂S/∂b and ∂S/∂c to zero and solving the resulting simultaneous equations, as for the linear least squares regression case above. Standard computer programs to estimate the parameters a, b and c by numerical methods are widely available and therefore a detailed solution is not presented here.

Polynomial least squares regression

Polynomial least squares regression is used to estimate the parameters of the pth order relationship y = a₀ + a₁x + a₂x² +…+ a_px^p between two sets of measurements y₁ ...y_n, x₁ ...x_n.

The deviation of each point (x_i, y_i) from the line can be expressed as d_i, where:

d_i = y_i (a₀ + a₁x_i + a₂x²_i +…+ a_px^p_i

The best-fit line is obtained when the sum of the squared deviations given by

S = (d²_i)

is a minimum.

The minimum can be found as before by setting the p partial derivatives ∂S/∂a0 . . . ∂S/∂ap to zero and solving the resulting simultaneous equations. Again, as for the quadratic least squares regression case, standard computer programs to estimate the parameters a₀ ...a_p by numerical methods are widely available and therefore a detailed solution is not presented here.

Confidence tests in curve fitting by least squares regression

Once data has been collected and a mathematical relationship that fits the data points has been determined by regression, the level of confidence that the mathematical relationship fitted is correct must be expressed in some way. The first check that must be made is whether the fundamental requirement for the validity of regression techniques is satisfied, i.e. whether the deviations of data points from the fitted line are all less than the maximum error level predicted for the measured variable. If this condition is violated by any data point that a line or curve has been fitted to, then use of the fitted relationship is unsafe and recourse must be made to tabular data presentation, as described earlier.

The second check concerns whether or not random errors affect both measured variables. If attempts are made to fit relationships by regression to data where both measured variables contain random errors, any relationship fitted will only be approximate and it is likely that one or more data points will have a deviation from the fitted line or curve that is greater than the maximum error level predicted for the measured variable. This will show up when the appropriate checks are made.

Having carried out the above checks to show that there are no aspects of the data which suggest that regression analysis is not appropriate, the next step is to apply least squares regression to estimate the parameters of the chosen relationship (linear, quadratic etc.). After this, some form of follow-up procedure is clearly required to assess how well the estimated relationship fits the data points. A simple curve-fitting confidence test is to calculate the sum of squared deviations S for the chosen y/x relationship and compare it with the value of S calculated for the next higher order regression curve that could be fitted to the data. Thus if a straight-line relationship is chosen, the value of S calculated should be of a similar magnitude to that obtained by fitting a quadratic relationship. If the value of S were substantially lower for a quadratic relationship, this would indicate that a quadratic relationship was a better fit to the data than a straight-line one and further tests would be needed to examine whether a cubic or higher order relationship was a better fit still.

Other more sophisticated confidence tests exist such as the F-ratio test. However, these are outside the scope of this book.

Correlation tests

Where both variables in a measurement data set are subject to random fluctuations, correlation analysis is applied to determine the degree of association between the variables. For example, in the case already quoted of a data set containing measurements of human height and weight, we certainly expect some relationship between the variables of height and weight because a tall person is heavier on average than a short person. Correlation tests determine the strength of the relationship (or interdependence) between the measured variables, which is expressed in the form of a correlation coefficient.

For two sets of measurements y₁ ...y_n, x₁ ...x_n with means x_m and y_m, the correlation coefficient is given by:

The value of | | always lies between 0 and 1, with 0 representing the case where the variables are completely independent of one another and 1 the case where they are totally related to one another. For 0 < | | < 1, linear least squares regression can be applied to find relationships between the variables, which allows x to be predicted from a measurement of y, and y to be predicted from a measurement of x. This involves finding two separate regression lines of the form

y = a + bx and x = c + dy

These two lines are not normally coincident as shown in Figure 11.13. Both lines pass through the centroid of the data points but their slopes are different.

As || 1, the lines tend to coincidence, representing the case where the two variables are totally dependent upon one another

As || 0, the lines tend to orthogonal ones parallel to the x and y axes. In this case, the two sets of variables are totally independent. The best estimate of x given any measurement of y is xm and the best estimate of y given any measurement of x is y_m.

For the general case, the best fit to the data is the line that bisects the angle between the lines on Figure 11.13

Monday, December 20, 2021

11 Display, recording and presentation of measurement data

No comments:

Post a Comment

Labels

INSTRUMENTATION MANUFACTURERS