Abstract In recent years, digital behavioral data (DBD) have emerged as a powerful resource in social science research. Their ubiquity, granularity, complexity, and continuous collection provide new opportunities for examining social processes in great detail. However, because DBD are diverse in type and often constitute found data—not generated for research purposes—their potential for causal analysis is commonly underestimated. To address this issue, this paper outlines key considerations for developing a methodological framework for valid causal inference using DBD. The discussion focuses on how design limitations can be (i) ruled out a priori when generating designed DBD or (ii) compensated through theoretical and temporal information, the specification of structural causal models, a posteriori design considerations, and the application of appropriate analytical tools, making found DBD fit for the purpose of causal effect estimation.
Leitgöb et al. (Thu,) studied this question.