Large language models write fluent prose yet still struggle with verifiable, compositional reasoning in advanced mathematics; we address this gap with a compact, cognitively grounded protocol that mirrors how mathematicians think. Our framework instantiates seven human--inspired dimensions--concept formation, dualization, negative knowledge, transfer, and more--via meta--prompts drawn from active research problems, not toy exercises, and audits full solution traces for faithfulness and invariant control. Under identical conditions, we benchmark four state--of-the--art systems and observe a global breaking degree of more than ninety percent on stress tests. In general terms, error forensics reveal systematic failures in lemma synthesis, long--horizon planning, premise selection, and counterexample search. From these findings we suggest the systematic integration of the aforementioned new tactic to enhance concrete levers--rationale SFT, process supervision with process reward models, and stepwise preference learning--that directly target step--level correctness. We further outline an Artificial Mathematical Intelligence (AMI) agenda to model concept creation and proof discovery along these lines. Together, the protocol and interventions chart a reproducible path toward the systematic design of genuinely creative mathematical reasoning in LLMs and related IA--based systems.
Danny Arlen de Jesus Gomez-Ramirez (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: