This paper develops theory, algorithms, and evaluation methodology for calibrating structured outputs produced by syntactic and semantic parsers (UD, AMR, SDP). It extends scalarcalibration techniques (temperature scaling, Brier score) to transport-plan and graph-structuredpredictions, proposes parametric and post-hoc calibration maps for structured plans, and providesempirical protocols and diagnostics for measuring calibration in downstream extraction tasks.The manuscript situates the contributions within recent advances in calibration and uncertaintyquantification for deep models and graph neural networks, and supplies pseudocode, theoreticalbounds linking structured calibration to downstream risk, and a reproducible experimental plan.
Usman Zafar (Tue,) studied this question.