Abstract Current neural decompilers (e.g., HELIOS, LLM4Decompile) treat binary-to-source translation primarily as a text generation problem, which can yield outputs that are syntactically plausible but behaviorally incorrect or non-compilable. T This paper introduces a training paradigm for axiomatic decompilation that incorporates execution-derived signals and a description-length bias inspired by Minimum Description Length (MDL) and algorithmic information theory. By framing reverse engineering as a search for a compact explanation of observed behavior, the model is encouraged to ignore “compiler noise” and prefer simpler reconstructions that remain consistent with traces.
Leon Calvin II long (Mon,) studied this question.