This work introduces Merge-Reasoner, a new theoretical and practical framework for combining large language models (LLMs) in a principled way. Traditional merging methods often fail when different inputs lead to the same output (a many-to-one or non-injective structure), which creates systematic errors in reasoning and diagnosis. Our framework addresses this by treating merging as a set-valued problem. We prove five core theorems that establish when thresholding, top-k selection, and consensus voting are optimal, and we introduce the concept of merge degree as a rigorous way to measure uncertainty. These results provide formal guarantees for when and how LLM merging can succeed. Beyond theory, Merge-Reasoner includes practical engineering guidelines for GPU optimization, consensus-based error reduction, calibration, and diversity injection. This makes the approach not only mathematically sound but also deployable in real-world systems. Applications include medical diagnosis, abductive reasoning, multi-hop question answering, and any domain where multiple causes can lead to the same effect. By combining formal proofs with practical algorithms, this work lays the foundation for reliable model merging, offering both researchers and practitioners a roadmap for building ensembles that are provably robust and efficient.
Milad Shaddelan (Fri,) studied this question.