March 3, 2026Open Access

Modèles Causaux Structurels pour la Génération de Données Synthétiques

Puntos clave

Causal Machine Learning methods improve predictive power and decision-making in various domains, including marketing.
The CausalProfiler generator creates synthetic causal datasets, enhancing empirical evaluations of Causal ML methods.
Deep Structural Causal Models are evaluated for their ability to address counterfactual questions from observational data.
Rigorous empirical evaluations with synthetic data can facilitate broader adoption of Causal ML methods.

Resumen

Causal Machine Learning (Causal ML) has the potential to revolutionize decision-making by combining the predictive power of machine learning algorithms with the theory of causal inference. However, these methods remain underutilized in real-world applications, as current empirical evaluations of these methods do not permit the assessment of their reliability and robustness, undermining their practical utility. This thesis, motivated by marketing efficiency measurement applications, aims to contribute, across four projects, to the assessment of the usefulness of Causal ML methods through the empirical study of their behavior using synthetic experiments.The first two projects carried out during this thesis examine the usefulness of certain Causal ML methods, selected with regard to the marketing applications mentioned above. The first project empirically analyzes the ADMG causal data augmentation method, considering different synthetic settings to support researchers and practitioners in understanding under which conditions prior knowledge can enhance the robustness of predictive models. The second project reviews the methods learning Deep Structural Causal Models by studying their ability to answer counterfactual questions using observational data within known causal structures. The analysis of the hypotheses and guarantees of these methods reveals that the majority of the theoretical results are rooted in the intrinsic properties of the deep learning architectures. Moreover, our study highlights the critical need for developing standardized benchmarks that encapsulate the complexities encountered in various real-world applications.The first part of the thesis revealed numerous limitations in current evaluation practices of Causal ML methods. In the second part of this thesis, we therefore focused on improving these practices in order to refine the assessment of the usefulness of Causal ML methods. Notably, in the third project, we argue that synthetic experiments are necessary to precisely assess and understand the capabilities of Causal ML methods. After highlighting the shortcomings of the current practices, we propose a set of principles for conducting rigorous empirical evaluations with synthetic data. Finally, the last project directly builds upon the previous one by proposing a method to randomly generate synthetic causal datasets for evaluating Causal ML methods across the three levels of causal reasoning: observation, intervention, and counterfactual. Based on a set of explicit design choices about the class of causal models, queries, and data considered, our generator, named CausalProfiler, randomly samples sets of data, assumptions, and ground truths constituting the synthetic causal benchmarks. In this way, Causal ML methods can be evaluated on assumption-aware synthetic experiments with increased realism, diversity, and comparability. We demonstrate the utility of CausalProfiler by evaluating several state-of-the-art Deep Structural Causal Models under diverse conditions, both in and out of their identification regime, illustrating the types of analyses and insights our generator unlocks.Through the four projects carried out, this thesis contributes to an improved understanding of the empirical utility of Causal ML methods by (a) characterizing the utility of some Causal ML methods and (b) empowering researchers and practitioners to perform this characterization across a wide range of methods through the development of best practices and the implementation of a generator of synthetic causal benchmarks. The adoption of our recommendations and generator will hopefully enable more rigorous and in-depth evaluations, thereby increasing their broader adoption and impactful use for real-world applications such as marketing efficiency measurement.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Audrey Poinsot

Université Paris-Saclay

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Modèles Causaux Structurels pour la Génération de Données Synthétiques

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study