Suppose the mean of group A is significantly different from 0 and the mean of group B is not. Then the mean of group A is not necessarily different from the mean of group B. Stated so baldly, this much is well known to anyone with experience of statistical testing, but I wish to highlight its relevance to the article by Coulter and Shallo-Hoffmann. 1 These authors investigated performance on the developmental eye movement (DEM) test. Subjects having an abnormal DEM test showed a statistically significant decline in performance (increase in the number of errors) between the first and second halves of the test; but for control subjects, there was no significant change. Coulter and Shallo-Hoffmann go on to say that an oculomotor dysfunction should produce errors throughout the test, that fatigue should produce errors in both the abnormal and control groups, and thus an attention problem may be present in the abnormal DEM group. This strategy for drawing conclusions seems reasonable, until it is pointed out that the decline in performance for the control group might be almost as great as for the abnormal group, just failing to reach statistical significance in one case and just reaching it in the other. An alternative strategy is to compare the declines in the two groups. When this is done, the two sets of declines are found not to differ significantly (using the 0.05 level of significance). This is true whether using a parametric test or, following Coulter and Shallo-Hoffmann's lead, a nonparametric test. For the reader who wishes to follow this up in the statistical literature, it may be helpful to restate the above using some jargon. Coulter and Shallo-Hoffmann had two factors in their experiment: part of DEM test (two levels, first and second half) and status (two levels, control and abnormal subjects). What they did was to conduct two tests for simple effects (effect of half at status = control and effect of half at status = abnormal). What I have proposed is a test for interaction (whether the difference between half = first and half = second is the same for status = control as it is for status = abnormal). The experiment used a form of repeated measures design, with part being the within-subjects factors and status being the between-groups factor. I am not going to say that “theories that live by statistics, shall die by statistics,” that the test for interaction is the only correct way to approach these data, that this has come out nonsignificant, and, therefore, that there is no evidence for the ideas of Coulter and Shallo-Hoffmann. There is no denying that the interaction effect is estimated to be 4.9, give or take 2.7, in the direction of the research hypothesis. I do say that the strategy used by Coulter and Shallo-Hoffmann has resulted in an overstatement of their case. T. P. Hutchinson
T. P. Hutchinson (Mon,) studied this question.