What question did this study set out to answer?

This research aims to explore how geographic heterogeneity influences the accuracy and fairness of antimicrobial resistance predictions.

January 18, 2026Open Access

Geographic Prevalence Heterogeneity Creates Unavoidable Fairness-Accuracy Trade-offs in Antimicrobial Resistance Prediction: A Multi-Method Analysis of 77,548 Bacterial Isolates

Read Full Paperexternally

Key Points

This research aims to explore how geographic heterogeneity influences the accuracy and fairness of antimicrobial resistance predictions.
Conducted a two-cohort study using the BV-BRC database, totaling 77,548 bacterial isolates.
Analyzed regional resistance prevalence through a simulation using 39,859 E. coli isolates from 132 countries.
Tested a model with genomic predictors on 37,689 E. coli isolates with fluoroquinolone resistance gene annotations.
Ciprofloxacin resistance prevalence varied from 16.8% in North America to 44.1% in Asia, showing a significant 27.3 percentage point gap.
Simulation indicated that at a 30% classification threshold, 44.3% of resistant isolates in below-threshold regions were missed.
The genomic validation model yielded a sensitivity disparity of 19.6 percentage points at uniform thresholds.

Abstract

Background: Machine learning models for antimicrobial resistance (AMR) prediction are trained predominantly on data from high-income countries, yet resistance prevalence varies dramatically across geographic regions. While algorithmic fairness frameworks have matured around race, sex, and age, geography has been examined in only 2.4% of medical AI fairness studies. We investigated whether geographic heterogeneity creates fundamental barriers to algorithmic fairness—a phenomenon we term the "Calibration Paradox." Methods: We conducted a two-cohort study (total N=77,548 isolates) using the BV-BRC database. The Primary Cohort (n=39,859 E. coli isolates from 132 countries) quantified regional resistance prevalence and demonstrated through simulation the mathematical consequences of applying any single classification threshold to populations with heterogeneous base rates. The Genomic Validation Cohort (n=37,689 E. coli isolates with fluoroquinolone resistance gene annotations) tested whether models using actual genomic predictors could avoid the threshold problem. Results: Ciprofloxacin resistance prevalence ranged from 16.8% in North America to 44.1% in Asia, a 27.3 percentage point gap (OR 3.90; 95% CI: 3.67-4.15; p<0.001). Simulation analysis demonstrated that when a well-calibrated model outputs regional prevalence as predictions, any single global threshold partitions regions into discrete classification groups. At a 30% threshold, 44.3% of resistant isolates from below-threshold regions would be missed. Genomic validation confirmed that a model trained on genomic features alone still produced regionally varying prediction scores, resulting in 19.6 percentage point sensitivity disparities at uniform thresholds. Conclusions: Geographic prevalence heterogeneity creates unavoidable fairness-accuracy trade-offs—the Calibration Paradox—for any globally-deployed AMR prediction model. No single threshold can achieve equitable performance across regions with different base rates. These findings demonstrate the need for region-specific models, mandatory geographic stratification in model evaluation, and recognition of geography as a protected attribute in medical AI fairness frameworks.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Hayden Luke Farquhar

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Geographic Prevalence Heterogeneity Creates Unavoidable Fairness-Accuracy Trade-offs in Antimicrobial Resistance Prediction: A Multi-Method Analysis of 77,548 Bacterial Isolates

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study