Vorlesung_9b.html

A practical Example of Comparing against a Large Corpus

For the expected values E you simply enter the O values of the large corpus as E. However, the corpus should be considerably larger to assure its quality of being a reference. If the reference data is only of similar size as the O data, the contingency-table method is preferrable, since it calculates E values composed of both the reference data and the O data to be tested, while respecting their appropriate sizes.

The example uses the distribution of the first 200 non-finite verbs of 5 different texts:

A Practical Example of Testing for Normal Distribution

The distribution of the auxiliary have in the 3rd person singular present indicative in the different sections of an arbitrary part of the LOB Corpus (part B in this case) should be a normal distribution. By calculating a normal curve based on mean and standard deviation and using the normal distribution as teh expected (E) values, we can use the chi-square test for testing for normality.