This work introduces PROXIMA, a framework for evaluating the reliability, fragility, and decision risk of proxy metrics used in controlled experiments such as A/B testing in online experiments. Unlike prior work that assumes monotonic proxy validity, PROXIMA quantifies directional accuracy, sign-flip fragility, and downstream decision regret, enabling principled proxy selection under delayed outcomes. Proxy metrics are widely used to enable rapid experimentation, yet they frequently fail to accurately reflect long-term business or system outcomes, leading to biased decisions and hidden risk. PROXIMA addresses this problem by modeling proxy reliability using simulation-based counterfactual analysis, sensitivity scoring, and long-horizon consistency checks. The method quantifies the alignment between short-term proxy signals and delayed ground-truth outcomes under distributional shifts and experimental noise. Empirical results demonstrate that PROXIMA can distinguish stable proxies from misleading ones and reduce decision risk in experimentation workflows. This work targets practitioners and researchers in machine learning systems, experimentation platforms, and applied causal inference. Source code and reproducibility artifacts are publicly available. Note: A provisional patent application covering core techniques described in this work has been filed. This disclosure is made in accordance with applicable patent and publication policies.
Building similarity graph...
Analyzing shared references across papers
Loading...
Avinash Amudala
Rochester Institute of Technology
Film Independent
Building similarity graph...
Analyzing shared references across papers
Loading...
Avinash Amudala (Mon,) studied this question.
synapsesocial.com/papers/69e07cc02f7e8953b7cbdec3 — DOI: https://doi.org/10.5281/zenodo.19562234
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: