What question did this study set out to answer?

The research explores the limits of collaborative efforts using large language models in resolving the Collatz conjecture.

June 7, 2026Open Access

Cross-AI, Lean-Verified Mathematics: A Case Study on the Collatz Conjecture

Read Full Paperexternally

Key Points

The research explores the limits of collaborative efforts using large language models in resolving the Collatz conjecture.
Conducted a methodology retrospective on the Collatz conjecture across five versions.
Utilized multiple AI models for idea generation, validation, and Lean verification.
Developed a verified artifact using Lean 4 and Mathlib without axioms or admits.
Generated approximately 141,000 lines of AI-authored source code with 302 Lean theorems.
Established barriers to progress in the Collatz conjecture via formal proof checking using Lean.
Documented analytic routes with clear negative results, outlining the limitations of the approach.

Abstract

Cross-AI, Lean-Verified Mathematics: A Case Study on the Collatz Conjecture (version 5 — methodology retrospective and honest close). This is the fifth and closing version of a personal research program on the Collatz conjecture. Its primary subject is the program's explicitly declared secondary goal: testing how far iterative, cross-checked collaboration with large language model (LLM) assistants can carry a non-standard attempt on an open mathematical problem. The mathematics is reported honestly as a closed (but not abandoned) chapter; the methodology is brought to the foreground. We do not claim a proof of the Collatz conjecture, nor progress toward one. v5 reports a verified artifact, a candid account of what cross-AI mathematical collaboration can and cannot do, and a clearly delimited open frontier. The v1→v5 arc. From an ambitious v1 (a proposed spectral route to a resolution), through the Lean-verified shadowing core (v2), a conditional reduction and phantom-set taxonomy (v3), and a consolidation that mapped two distinct barriers and showed why the current analytic routes stop there (v4), to v5, where the methodology becomes the subject and the program is closed honestly. The verified artifact (Lean 4 + Mathlib, sorry-/admit-/axiom-free): an exact congruential shadowing lemma, a no-infinite-shadowing corollary, an expanding-cycle exclusion, finite Collatz–Wielandt spectral-radius certificates (single node ≤ 97/2000; deterministic (K,b) bound < 3/4 at K0 = 16), and an elementary descent bridge. By the numbers (all AI-produced; human-typed repository lines: 0). Over roughly five weeks: ≈ 141,000 lines — AI-authored source ≈ 61,800 (Lean proofs 7,805 + Python 54,009), generator-emitted Lean certificates 43,513, and prose ≈ 35,673; 302 AI-authored Lean theorems/lemmas; zero sorry; 84 commits. The author contributed direction, conceptual framing, decisions, editing-by-instruction, publication choices, and external actions. Operational AI workflow. The project was not a simple one-model interaction. Google Gemini was used for broad, deliberately free-form speculative idea generation under the author's direction. Claude Code Opus 4.7/4.8 was used for validation, refinement, pruning of superficial ideas, proof planning, comparison against external critiques, and paper drafting. OpenAI Codex was used as the repository-operational agent: continuation of selected ideas, Python scripts, Lean code, generated certificates, builds, TODO files, Markdown notes, patching, and compilation support. The aider-desk interface connected to DeepSeek v4 Flash API was used as an external-opinion channel; its critiques were fed back to Claude Code for adversarial comparison before returning to Codex for implementation, revision, or rejection. Lean as arbiter. Cross-AI agreement was useful for finding and pruning ideas, but it was not treated as mathematical evidence. Formal claims entered the verified core only when Lean 4 + Mathlib built without sorry, admit, or axiom. For the author, Lean was the only reliable way to know that the AI-produced formal core had survived contact with a proof checker. Subscription and cost disclosure. During the project the author had active, general-purpose AI subscriptions or credits: Claude at roughly EUR 20/month; Gemini at roughly EUR 20/month; ChatGPT/OpenAI at roughly EUR 20/month initially and later roughly EUR 100/month as usage increased; and about EUR 50 of DeepSeek API credit. These figures are not an attributable project budget: the same subscriptions were simultaneously used for unrelated programming, websites, iOS/Android/React app work, image generation for the author's employer, generic tasks, VBA scripts, and other work. Future instrumentation. The project did not log AI usage with enough precision to reconstruct every model handoff, session, turn count, token count, or task category. Future AI-assisted studies by the author will instrument these variables from the start: model/version, interface, wall-clock session time, approximate token volume, artifact produced, verification result, and whether the result survived cross-AI or Lean review. Relation to Chang (2026). The closest external comparator is Edward Y. Chang's Exploring Collatz Dynamics with Human–LLM Collaboration (arXiv:2603.11066). Chang's work is broader, surveying many Collatz attack surfaces and independently locating the same distributional-to-pointwise obstruction. This project is narrower but machine-checked: its main contribution is a reproducible Lean 4 core plus an honest negative stopping rule for the present approach. Honest negative results. The report documents analytic routes that hit structural barriers; a cross-AI episode in which the assistants generated, then dismantled, four "revolutionary" proposals; and concrete failure modes such as label pattern-matching, hallucinated lemma names, and Lean 3-vs-4 syntax errors. The methodological finding we consider most interesting is that a cross-AI loop, when one side is anchored to concrete obstructions, can recognize and prove the ceiling of its own approach. The remaining mathematical frontier. Collatz splits into two hard halves: no divergent orbits (distributional-to-pointwise barrier) and no nontrivial cycles (Diophantine barrier). Within this program's formalism, the missing pointwise rigidity for the first half is essentially equivalent to the conjectural content itself. The cycle half remains open as future work; its structural layer is the most realistic target for further Lean formalization. Code and library. Full project: github.com/PieroBorgatta/Collatz. The verified Lean library is also published as a standalone Lake package (Reservoir-indexable): github.com/PieroBorgatta/CollatzShadowing. Versions. Concept DOI (always resolves to the latest): 10.5281/zenodo.20021537; previous: v4, v3, v2, v1. License: Creative Commons Attribution 4.0 International (CC BY 4.0).

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Piero Borgatta

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Cross-AI, Lean-Verified Mathematics: A Case Study on the Collatz Conjecture

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider