What question did this study set out to answer?

This research aims to address the biases introduced by using AI-generated variables in regression models, focusing on causal estimates.

May 29, 2026

Debiasing ML- or AI-Generated Regressors in Partially Linear Models

Key Points

This research aims to address the biases introduced by using AI-generated variables in regression models, focusing on causal estimates.
Developed new estimators for partially linear regression models.
Utilized a small human-annotated subsample with large AI-labeled data.
Tested the approach with both traditional ML algorithms and LLM-based predictions.
Achieved unbiased and efficient estimation with the new estimators.
Demonstrated corrections for measurement error leading to more reliable causal conclusions.
Showed potential implications for AI fairness by correcting biases in predictions.

Abstract

Organizations increasingly use machine learning (ML) and artificial intelligence (AI), including large language models, to generate variables for regression models that inform business and policy decisions. For example, practitioners may use AI to predict review sentiment, ad aesthetics, or emotional expressions, and then estimate their causal effects on outcomes such as sales or engagement. However, because AI predictions are imperfect, directly using these AI-generated variables as regressors introduces measurement error that can systematically bias causal estimates, potentially leading to over- or underinvestment in business strategies. We develop new estimators that correct this bias in partially linear regression models, which are widely deployed in experimental systems at major platforms, including Tencent, Amazon AWS, and Microsoft. Our approach requires only a small human-annotated subsample alongside the large AI-labeled data set to achieve unbiased and efficient estimation. We demonstrate that our methods work with both traditional ML algorithms and LLM-based predictions. Our framework can be directly integrated into existing analytics and experimental systems, enabling practitioners to leverage the scalability of AI-generated data while maintaining reliable causal conclusions. This work also has implications for AI fairness, as our approach can help correct biases from any source in AI predictions.

Bookmark

Debiasing ML- or AI-Generated Regressors in Partially Linear Models

Key Points

Abstract

Cite This Study