Abstract This article discusses and showcases approaches to causal inference with text as data, focusing on the challenges and opportunities that arise when sociological constructs are embedded in language. We highlight that text embeds latent constructs and comes with a number of measurement challenges: Variables of interest must be interpreted through coding, feature extraction, and modelling choices. We distinguish between three central designs: when text is the outcome to be explained (text as outcome), when text functions as the treatment whose effects are to be estimated (text as treatment), and when information extracted from text is used as control variable (text as control). Drawing on examples from our own research, we illustrate how recent methodological advances allow researchers to learn latent outcomes or latent treatments from textual data while preserving the logic of causal identification. We use state-of-the-art techniques for drawing causal inferences with text as data and show how text can serve as a window into sociologically relevant constructs, while also underscoring the interpretive leeway inherent in computational modelling. Our analysis demonstrates that causal inference with text requires careful attention to theory, transparency about researchers’ choices, and sensitivity to the data-generating process. We conclude that text-as-data approaches hold promise for causal analysis and can contribute to sociological explanation.
Schwitter et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: