This technical note analyzes the limitations of integrated evaluation metrics commonly used in machine learning. By isolating a minimal condition under which metric aggregation collapses distinct internal model representations into identical scores, it shows how discriminative information may be lost despite apparent performance agreement. The analysis is purely diagnostic and model-agnostic, focusing on the inferential limits imposed by metric integration rather than on specific architectures, training procedures, or optimization strategies.
Danilo Tavella (Sat,) studied this question.