Abstract Solid-liquid interfaces challenge atomistic simulation by combining chemically heterogeneous bonding, long-range electrostatics, and rapidly evolving solvation structures. Universal machine-learning interatomic potentials (uMLIPs) promise near-first-principles accuracy across diverse chemical systems, yet their reliability in complex interfacial environments remains largely untested. Here, we benchmark two widely used uMLIPs, MACE and UMA, for simulations of Au(100)/water interfaces with and without solvated NaOH. While finetuning the models on AIMD trajectories successfully improved structural accuracy for bulk-like regions (e.g., gold-gold and water-water interactions), it induced significant, localized failures at the interface, particularly when sodium atoms were involved. To diagnose these failures, we introduce a post-training, atom-resolved uncertainty framework based on quantile regression of latent MLIP embeddings, enabling direct identification of chemically specific regions where model predictions become unreliable. Our results reveal that improvements in global accuracy can obscure chemically meaningful, localized errors. The combination of finetuning with atom-resolved uncertainty analysis provides a practical path toward reliable deployment of uMLIPs for interfacial, electrochemical, and heterogeneous systems.
Bilbrey et al. (Tue,) studied this question.