When we talk about quality control we hear about distributions, such as the poisson, hypergeometric, binomial, normal, “t”, chi-squared and “F”. How complicated!
And we are told to worry about things being independent, are inundated with words like variance, mean, median, mode, standard deviation, whether the standard deviation is homo or hetroscedastic (whether the standard deviation is constant or not), confidence limits, and such things as Type I error, Type II error, null hypothesis etc. It cannot be denied that all of these have their place. However, to get to the basics, all we are really trying to do is measure lengths. Statistics is really simply analytical geometry or linear algebra, depending on one’s outlook. Let’s look at the mean and standard deviation.
Mean (one type of average). We are told that it is the first moment around the origin.
Mathematically it is the integral of xf(x)dx between some limits where f(x) is some distribution function. Yet it is still length.
Consider a set of “n” data points, X= (x1, x2, —, xn). Then visualize a graph of n dimensions with a single location, X, representing those data. Also visualize a line in that n dimensional space that is equidistant from each axis, i.e. It goes through (1,1,—–,1) etc. Drop a line perpendicular from X to that equidistant line. Call that point M=(µ, µ,—-, µ). Divide every point by the square root of n, the number of data points to introduce the number of tests into our considerations.
The line (δ ) from the X to M would be the vector (x1– µ, x2– µ, —, xn– µ) while the line (µ) from the origin to M would be the vector (µ, µ,—-, µ). Since the two lines are perpendicular, their scalar (or inner or dot) product would be zero:
((µ, µ,—-, µ))·((x1– µ, x2– µ,—, xn– µ)/ )= 0
x1, + x2, +—-,+ xn – nµ = 0
µ= (x1, + x2, +—-, + xn)/n, which is identical to the form for the mean.
That is, the length of the line µ from the origin to M is equal in value to the mean of the data points.
Standard Deviation. The length of the line, δ, from X to M is the square root of (1/n)*((x1)2+ (x2)2+—-,+ (xn)2 – nµ2). (1/n)*(x1)2+ (x2)2+—-,+ (xn)2 is the square of the length of the line from the origin to the data, X, while (1/n)*(nµ2) is the square of the length from the origin to the point of M.
δ = ((1/n)*((x1)2+ (x2)2+—-, + (xn)2 -nµ2))0.5
Thus the equation of the length of the line δ is identically to one of the equations used for calculating standard deviations (where the standard deviation is not a random variable. If the sample standard deviation (s) is a random variable, 1/n would be replaced with 1/(n-1)).
Rulers. To measure lengths we need a ruler. We use miles in the United States, in Canada they use kilometers while in Russia, the Verst may be used. In statistics the ruler used is the length, “δ”, if the standard deviation is known or, “s” if the standard deviation is a random variable.
The many terms mentioned above and the sophistication of the mathematics are important in establishing the reliability of the data, still, basically we are only measuring lengths.