[Summary by Jotun Hein]
Again, extremely worthwhile!!
Fisher clearly had a series of pioneering papers 1918 and the following years/decades and if it is not totally the one we discussed today is the one that introduces most new concepts, but that is not so essential: A series of ideas that since found their way into textbooks are (almost) first introduced in these papers.
He seems to suffer a bit by not having an established framework of probability and on the first page he has what seems a very convoluted formulation of the law of large number, but manages to have 6 vectors to define it. Eventually stating that if a probability vector in the hidden model model is close to the true model, then the probability fractions in the finite observations from the two models will also be close to each other. And then he states that this could probably be proven, but decides not to!!
He then moves on to define estimation, which is a mapping from observations to parameter space and he then exemplifies the problems you can get into if trying to estimating the mid-point in the Cauchy-distributions by the empirical means. And he doesn’t refer to Cauchy either. Then consistent esimators (converges to the true parameter as observation number increases) and if you ask for more you also wan’t efficiency (the asymptotic variance is minimal). He goes on to prove that the maximum likelihood estimator is efficient under mild conditions. He proves that two efficient estimators will asymptotic correlation of 1, which must mean one is an afine function of the other as far as I can see. He shows how you can rectify the bias (converging to the wrong value) of an efficient estimator. He defines the Fisher Information matrix but doesn’t call it that. Very interesting discussion of the inherent noise in the data (unavoidable) and error cause by inefficient estimators (avoidable).
Clearly one issue is the contrast between asymptotic properties and properties of estimators for finite data.
The sufficient statistics (a function of the data that contains ALL the information about the parameter) and ancilliary statistics (functions of the data that contains NO information about the data). Nothing about minimally sufficient statistics or the non-existance of maximally ancillary statistics.
I hope I didn’t mess up too much in my summary – it was very worth reading a recaptulated a series of key concepts in likelihood statistics. We got a bit tired the last 10 pages. The ½ page letter by Rev. Bayes was the wrong one so we will return to that.
The writing style is a bit peculiar: He only have 3 references and that is to himself. Nothing on hypothesis testing in this paper. He has some intuitive jumps and some places a curve or illustration would have helped, but in general very enjoyable. He uses the phrase “Without ancedent knowledge” – almost sounds like “Don’t use the word prior, please”