What question did this study set out to answer?

This article aims to explore methods for causal inference using text data while addressing inherent challenges.

April 25, 2026Open Access

Text as Data and Causal Inference in Sociology

Key Points

This article aims to explore methods for causal inference using text data while addressing inherent challenges.
Distinguishes between text as outcome, treatment, and control variables
Utilizes recent methodological advances for causal identification
Emphasizes the need for careful theoretical framing and transparency
Demonstrates the potential of text as data for exploring sociological constructs
Highlights the interpretive flexibility needed in computational modelling
Concludes that text data approaches can enhance causal analysis in sociology

Abstract

Abstract This article discusses and showcases approaches to causal inference with text as data, focusing on the challenges and opportunities that arise when sociological constructs are embedded in language. We highlight that text embeds latent constructs and comes with a number of measurement challenges: Variables of interest must be interpreted through coding, feature extraction, and modelling choices. We distinguish between three central designs: when text is the outcome to be explained (text as outcome), when text functions as the treatment whose effects are to be estimated (text as treatment), and when information extracted from text is used as control variable (text as control). Drawing on examples from our own research, we illustrate how recent methodological advances allow researchers to learn latent outcomes or latent treatments from textual data while preserving the logic of causal identification. We use state-of-the-art techniques for drawing causal inferences with text as data and show how text can serve as a window into sociologically relevant constructs, while also underscoring the interpretive leeway inherent in computational modelling. Our analysis demonstrates that causal inference with text requires careful attention to theory, transparency about researchers’ choices, and sensitivity to the data-generating process. We conclude that text-as-data approaches hold promise for causal analysis and can contribute to sociological explanation.

Bookmark

View Full Paper