Key points are not available for this paper at this time.
Abstract Monte Carlo methods were used to examine the Type I error performance of a number of goodness-of-fit statistics under cluster sampling. The study has reduced the comparison to four procedures: (a) X 2 s, the Rao-Scott Satterthwaite adjusted X 2; (b) XJ , Fay's jackknifed X 2; (c) FW , a modified Wald statistic referred to an F distribution; and (d) FX 2 c, an F-based version of the Rao-Scott -adjusted X 2, depending only on the cell “design effects” (DE) unlike the others. The statistic FX 2 c performs well provided that the coefficient of variation, a, of the eigenvalues, λ i , of the “design effects matrix” is small. In general, both X 2 s and XJ perform well even when a is not small. FW behaves reasonably well for tests of uniform probability, π = π0 = (1/k, …, 1/k)′, where k is the number of categories and π = (π1, …, πk-1)′, πk = 1 – π1 - … - πk-1, but it is sensitive to skewness in π0. The power performance of procedures (a)–(d) was also examined. Attention was focused on π0 = (1/k, …, 1/k)′, the data being generated under two basic forms of the alternative π, consisting of single cell deviations from π0 of the form π1 = 1/k + ϵ with ϵ positive or negative. The power of FW is sensitive to the form of π and is markedly less than the power of its competitors in the important case of nonconstant DE, a > 0. XJ shows much better power characteristics but has somewhat less power than X 2 s when a > 0. Methods for analysis of categorical data have been extensively developed under the assumption of multinomial or product-multinomial sampling. In particular, the standard Pearson chi-squared statistic, X 2, and the likelihood ratio statistic, G 2, are used to test hypotheses in multiway contingency tables, employing log-linear models. These methods are often used by researchers in subject matter areas (e.g., social and health sciences) to analyze sample survey data, even though the multinomial assumption is violated because of clustering and stratification used in the survey design. Ignoring the effect of survey design and using X 2 or G 2 could lead to unacceptably high Type I error under cluster sampling. Alternative test statistics that take account of the design have been proposed: X 2 c (or G 2 c), designed for use with published tables, require knowledge only of cell DE (or variances) and certain marginal DE, whereas other statistics, X 2 s, (or G 2 s) and XJ (or GJ) require knowledge of the full covariance matrix or access to the microdata file. Large-sample properties of these test statistics have been previously studied. Small-sample Type I error and power performances have been examined in this article, under simulated cluster sampling. The Monte Carlo study, however, has been confined to the simple goodness-of-fit test, π = π0, on the vector of cell proportions, π. The results of this study should be useful to both statisticians and researchers in subject matter areas interested in the analysis of categorical data from complex sample surveys.
Thomas et al. (Mon,) studied this question.