The purpose of this and previous blogs is to show that statistics is basically linear algebra and intrinsically simple. Thus in previous posts I showed that data gathered can be simply expressed by lines, the lengths of which represent the mean and the standard deviation of the data respectively. It was also established that the two lines are independent of one another. If we actually knew that they absolutely represented the true values of each, we would be through. However, that is not the case. The mean is always sort of “fuzzy” in that we can’t be sure that what we measured is the true value. Measuring uncertainty is where it gets complicated. There is usually uncertainty with the standard deviation, but not always. Data sources from facilities that routinely manufacture a product may have sufficient data on the standard deviations to be able to assume their data represents the true value.
True Value of the Standard Deviation is Known.
In this case the normal distribution is all that is needed to evaluate the uncertainty of the mean.
True Value of the Standard Deviation is Not Known
Usually standard deviations are also fuzzy thus both the mean and the standard deviations can be considered to be random variables. While the mean is normally distributed, the square of the standard deviation (variance) is distributed according to the chi squared distribution. (The chi squared distribution with one degree of freedom is the square of the normal distribution.) However, the distribution that we want is that of the mean divided by the standard deviation, both of which being random variable.
Derivation of the “t” Distribution
The complication is what we need now is the distribution of the normal distribution divided by the square root of the chi square distribution. The equation for that is:
f(z) = Integral |x|f(x)f(zx)dx
where zx is the normal distribution, x is the square root of the chi square distribution and z is the “t” distribution.
The calculation of the derivation of the “t” distribution may be found in Statistical Inference, Vijay K. Rohatgi, John Wiley & Sons, 1984.
So we see that while the basic concepts of statistics is simple, the problem of uncertainty is complex.