Key points are not available for this paper at this time.
While it is generally recognised that subgroup analyses can produce spurious results, the extent of the problem is almost certainly under-estimated. This is particularly true when subgroup-specific analyses are used. In addition, the increase in sample size required to identify differential subgroup effects may be substantial and the commonly used 'rule of four' may not always be sufficient, especially when interactions are relatively subtle, as is often the case. CONCLUSIONS--RECOMMENDATIONS FOR SUBGROUP ANALYSES AND THEIR INTERPRETATION: (1) Subgroup analyses should, as far as possible, be restricted to those proposed before data collection. Any subgroups chosen after this time should be clearly identified. (2) Trials should ideally be powered with subgroup analyses in mind. However, for modest interactions, this may not be feasible. (3) Subgroup-specific analyses are particularly unreliable and are affected by many factors. Subgroup analyses should always be based on formal tests of interaction although even these should be interpreted with caution. (4) The results from any subgroup analyses should not be over-interpreted. Unless there is strong supporting evidence, they are best viewed as a hypothesis-generation exercise. In particular, one should be wary of evidence suggesting that treatment is effective in one subgroup only. (5) Any apparent lack of differential effect should be regarded with caution unless the study was specifically powered with interactions in mind. CONCLUSIONS--RECOMMENDATIONS FOR RESEARCH: (1) The implications of considering confidence intervals rather than p-values could be considered. (2) The same approach as in this study could be applied to contexts other than RCTs, such as observational studies and meta-analyses. (3) The scenarios used in this study could be examined more comprehensively using other statistical methods, incorporating clustering effects, considering other types of outcome variable and using other approaches, such as Bootstrapping or Bayesian methods.
Brookes et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: