3.5.2 Graphical data analysis
techniques – frequency distributions
Graphical techniques are a very
useful way of analysing the way in which random measurement errors are
distributed. The simplest way of doing this is to draw a histogram, in which
bands of equal width across the range of measurement values are defined and the
number of measurements within each band is counted. Figure 3.5 shows a
histogram for set C of the length measurement data given in section 3.5.1, in
which the bands chosen are 2 mm wide. For instance, there are 11 measurements
in the range between 405.5 and 407.5 and so the height of the histogram for
this range is 11 units. Also, there are 5 measurements in the range from 407.5
to 409.5 and so the height of the histogram over this range is 5 units. The
rest of the histogram is completed in a similar fashion. (N.B. The scaling of
the bands was deliberately chosen so that no measurements fell on the boundary
between different bands and caused ambiguity about which band to put them in.)
Such a histogram has the characteristic shape shown by truly random data, with
symmetry about the mean value of the measurements.
As it is the actual value of
measurement error that is usually of most concern, it is often more useful to
draw a histogram of the deviations of the measurements
from the mean value rather than to
draw a histogram of the measurements them[1]selves.
The starting point for this is to calculate the deviation of each measurement
away from the calculated mean value. Then a histogram of deviations can be
drawn by defining deviation bands of equal width and counting the number of
deviation values in each band. This histogram has exactly the same shape as the
histogram of the raw measurements except that the scaling of the horizontal
axis has to be redefined in terms of the deviation values (these units are
shown in brackets on Figure 3.5).
Let us now explore what happens to
the histogram of deviations as the number of measurements increases. As the
number of measurements increases, smaller bands can be defined for the
histogram, which retains its basic shape but then consists of a larger number
of smaller steps on each side of the peak. In the limit, as the number of
measurements approaches infinity, the histogram becomes a smooth curve known as
a frequency distribution curve as shown in Figure 3.6. The ordinate of this
curve is the frequency of occurrence of each deviation value, F(D), and the
abscissa is the magnitude of deviation, D.
The symmetry of Figures 3.5 and 3.6
about the zero deviation value is very useful for showing graphically that the
measurement data only has random errors. Although these figures cannot easily
be used to quantify the magnitude and distribution of the errors, very similar
graphical techniques do achieve this. If the height of the frequency
distribution curve is normalized such that the area under it is unity, then the
curve in this form is known as a probability curve, and the height F(D) at any
particular deviation magnitude D is known as the probability density function
(p.d.f.). The condition that
the area under the curve is unity can be expressed mathematically as:
The probability that the error in any
one particular measurement lies between two levels D1 and D2 can be calculated
by measuring the area under the curve contained between two vertical lines
drawn through D1 and D2, as shown by the right-hand hatched area in Figure 3.6.
This can be expressed mathematically as:
articular importance for assessing
the maximum error likely in any one measure[1]ment
is the cumulative distribution function (c.d.f.). This is defined as the
probability of observing a value less than or equal to D0, and is
expressed mathematically as:
Thus, the c.d.f. is the area under
the curve to the left of a vertical line drawn through D0, as shown by the
left-hand hatched area on Figure 3.6.
The deviation magnitude Dp
corresponding with the peak of the frequency distri[1]bution
curve (Figure 3.6) is the value of deviation that has the greatest probability.
If the errors are entirely random in nature, then the value of Dp will equal
zero. Any non-zero value of Dp indicates systematic errors in the data, in the
form of a bias that is often removable by recalibration.
No comments:
Post a Comment
Tell your requirements and How this blog helped you.