August 24, 2024Open Access

Policy-Based Bayesian Active Causal Discovery with Deep Reinforcement Learning

Key Points

Causal discovery improves with the proposed reinforcement learning-based approach, enhancing information gain.
The method reduces local optimality risks through a dense reward function, accelerating intervention selection.
Assessment via partially observable Markov decision processes enables adaptive causal inference across datasets with varying complexity and dimensions, enhancing practical implementation and accuracy results in multiple domains, including synthetic and semi-synthetic data exploration and analysis frameworks. The method is triaged with improved Q-function estimators for decreased time costs in training processes, allowing for smoother real-world application and increased operational productivity.

Abstract

Causal discovery with observational and interventional data plays an important role in numerous fields. Due to the costly and potentially risky nature of intervention experiments, selecting informative interventions is critical in real-world situations. Several recent works introduce Bayesian active learning to select interventions that maximize the expected information gain about the underlying causal relationship at each optimization step. However, there are still some limitations within these methods: (1) Local optimality. With multiple intervention experiments, selecting optimal intervention myopically at each step may drop into the local optimal point. (2) Expensive time cost. Optimizing the most informative intervention at each step is time-consuming and not suitable for adaptive experiments with strict inference speed requirements. In this study, we propose a novel method called Reinforcement Learning-based Causal Bayesian Experimental Design (RL-CBED) to reduce the risk of local optimality and accelerate intervention selection inference. Specifically, we formulate the active causal discovery problem as a partially observable Markov decision process (POMDP). We design an information gain-based sparse reward function and then improve it to a dense reward function, providing fine-grained feedback to help the RL policy learn more quickly in complex environments. Moreover, we theoretically prove that the Q-function estimator can be learned using only trajectories sampled from the prior, which can significantly reduce the time cost of training process, enabling the real-world application of our method. Extensive experiments on both synthetic and real world-inspired semi-synthetic datasets demonstrate the effectiveness of our proposed method.

Policy-Based Bayesian Active Causal Discovery with Deep Reinforcement Learning

Key Points

Abstract

Cite This Study

Also Consider

Also Consider