March 1, 1961Open Access

Some Tests for Categorical Data

Key Points

Key points are not available for this paper at this time.

Abstract

We shall be concerned with experimental data given in the form of frequencies in cells determined by a multiway cross-classification, with predefined categories along each way of classification. Roy and Bhapkar 10 have posed hypotheses, which might be considered generalizations appropriate to this set up of the usual hypotheses in classical "normal" univariate "fixed effects" analysis of variance, "normal" multivariate "fixed effects" analysis of variance and analysis of various kinds of "normal" independence. Large sample tests for such hypotheses are offered here. The large sample tests suggested are based on the ²-test of Karl Pearson 8. The general probability model is that of a product of several multinomial distributions. According as the marginal frequencies along any dimension are held fixed or left free, that dimension is said to be associated with a "factor" or a "response" (or variable). The probability model is equation*1ⱼ n₎₉!n₈₉! p^n₈₉₈₉, equation* where ᵢ p₈₉ p₎₉ = 1 and ᵢ n₈₉ n₎₉ is held fixed. Thus i refers to categories of the response while j refers to categories of the factor. n₎₉ denotes the preassigned sample-size for the jth factor-category, out of which n₈₉ happen to lie in the ith response-category. It should be noticed that i may be a multiple subscript, say i₁, i₂, , iₖ; j also may be a multiple subscript, say j₁, j₂, jₗ. We then speak of a k-response (or k-variate) and l-factor problem According as a set of real numbers is or is not associated with the categories along any way of classification (factor or response), that way of classification will be said to be structured or unstructured. It is well-known (for example, Neyman 6) that if a hypothesis Hₒ is given in the form of certain constraints on the p₈₉'s, then a large sample test statistic of Hₒ under (1) for the model is a ² statistic given by ₈₉ (n₈₉ - n₎₉ p₈₉) ²/ (n₎₉ p₈₉), or a ²₁ statistic given by ₈₉ (n₈₉ - n₎₉ p₈₉) ²/n₈₉, where the p₈₉'s form any set of BAN estimates 6. In the particular case when the constraints are linear in p's, the method of minimum ²₁ permits a reduction of the problem to the solution of a system of linear equations and hence is more convenient. Reiersol 9 considers binomial experiments and makes use of results of Neyman 6 to determine tests for hypotheses appropriate to factorial experiments. Mitra 5 not only generalizes Reiersol's theorems to multinomial experiments, but also avoids his restriction that the parameter-sets in the different linear forms occurring in the hypothesis be nonoverlapping. We shall prove theorems to cover the cases that cannot be treated by these theorems. In Section 2, the ²₁ statistic based on the minimum ²₁ estimates is obtained to test linear hypotheses. It is further shown that, when Hₒ specifies linear functions of the p's as known linear functions of some unknown parameters, the ²₁ statistic, based on the minimum ²₁ estimates, is exactly the same as the minimum sum of squares of residuals obtained by a certain general least squares technique to estimate the unknown parameters. This is then applied to derive test criteria appropriate to various hypotheses proposed in 3 and 10.

Ask AI

Helpful

Bookmark

View Full Paper