What question did this study set out to answer?

The aim is to create a mathematical framework for understanding user operations on frozen-weight large language models (LLMs).

June 26, 2026Open Access

SAFE-AI: A Mathematical Framework for User-Applied Operations on Frozen-Weight LLMs

Key Points

The aim is to create a mathematical framework for understanding user operations on frozen-weight large language models (LLMs).
Developed a mathematical account of user-applied operations in LLM sessions.
Utilized concepts from stochastic processes, information geometry, and governance frameworks.
Formulated a research program with defined problems, success criteria, and community involvement.
Validated user-applied operations provide information beyond LLM's capabilities.
Defined governance structures that community members can control and assess.
Presented five extension problems to guide future research and collaboration.

Abstract

SAFE-AI: A Mathematical Framework for User-Applied Operations on Frozen-Weight LLMs develops a mathematical account of what users do when they apply structured discipline to sessions with frozen-weight large language models — and of what governance over that channel can and cannot achieve. Its organizing claim is that an LLM session is an asymmetric-information channel: the model's filtration is strictly contained in the user's, so user-applied operations — the brackets, verified anchors, and engagement modalities of the SAFE-AI practice — carry knowledge the model cannot reconstruct on its own. The framework gives these operations four mathematical lives across four Parts: as jumps in a stochastic process, as directions in an information geometry, as refinements of a σ-algebra, and as the objects of community governance — building toward a risk decomposition in which representational adequacy is defined relative to a community's own knowledge corpus. The mathematics is presented honestly as work in progress: load-bearing structural assumptions are named and stated in conservative, evidence-bounded forms; operations and claims carry falsifiers; conjectures are labeled as conjectures; and a consolidated claim register records what is proved, what is assumed, and what is open. The toolkit spans Markov-chain and piecewise-deterministic-process theory, information geometry, optimal transport, mean-field interacting-particle systems, and stochastic filtering, connected to the CARE Principles, OCAP, Te Mana Raraunga, and the GIDA framework for Indigenous data sovereignty. Part I — Inference-Time Dynamics. Models the session as a Markov process on context states with a unique, prompt-independent stationary distribution, and derives the inference-time risk identity: a contextual Bayes floor plus an exploitation gap. Absent user-applied operations, the conditional expectation drifts geometrically toward the stationary-averaged mean of the model's prior; the operations enter the process as jumps injecting information not derivable from the model's own filtration. Establishes recurrence for the piecewise-deterministic limits of the three traversal protocols and convergence guarantees for the workhorse operations — Progressive Enhancement (monotone Bayes-floor reduction) and Iterative Refinement (Wasserstein-contractive variance stabilization). A filtering formulation casts the channel as a pure-jump signal observed through announced, predictable interventions: observable times, hidden marks. Part II — Pullback Fisher Sensitivity. The geometry of when a user-applied operation actually moves the model. The context-space pullback of the output Fisher–Rao metric has an outlier-and-bulk spectral structure shown to be unconditional given a concentrated predictive distribution — pinned to the sorted output probabilities themselves — with the parameter-space, singular-learning-theory picture retained as a complementary account whose assumptions are stated in their conservative, trained-weight-evidence-bounded form. Supplies the Rayleigh-quotient sensitivity diagnostic with its estimator regimes, an empty-calorie criterion for operations that are fluent but ineffective, and a Pinsker bridge linking the Fisher and optimal-transport readings of operational efficacy. Part III — σ-Algebra Refinement and the Aleatoric Floor. Recasts session evolution as progressive refinement of the practitioner's σ-algebra and proves the additive bias / context-variance / aleatoric-floor decomposition, with a cross-term-vanishing lemma securing the classical split under path-dependent conditioning. Each user-applied operation is typed by the error term it moves — bias, context variance, or the empirical noise estimate — with rate-quantified variance contraction and a long-tail result in achievability/converse form under a named tail-separation assumption: for communities underrepresented in the pre-training prior, bias closure requires augmentation from the community's own corpus, and pretraining-adjacent retrieval cannot close it at any corpus size. The Part engages the 2022–2026 uncertainty-disentanglement debate directly, arguing that the σ-algebra-relative floor — "irreducible" always carries its conditioning information — is the formalization the critique literature calls for. Part IV — Governance Through the Asymmetric-Information Channel. Specializes the framework to community-conditional governance: the authorized target distribution and the admissible class of user-applied operations are the objects a community controls. Defines community-conditional adequacy in dual form — a diagnostic average and a verdict geometric mean that zeroes on any structurally unrepresented authorized direction — alongside a provenance hierarchy and a six-rung escalation ladder on which refusal is formally characterized, as infeasibility of adequacy over the highest authorized operation class, rather than procedurally asserted. A mean-field model of the human–LLM dyad population frames cultural collapse and sovereign solidarity as competing dynamics, and the data-processing inequality supplies the mechanism: community-authorized injection is precisely the exogenous information a closed feedback loop forbids itself — sovereignty as the negation of the DPI hypothesis. Community refusal is treated throughout as a success condition of the framework, not a failure. The Research Program. The monograph closes with five named problems extending its Open Invitation to research partners, each carrying a status, the enabling literature, what is missing, a success criterion with its falsifier, and a collaboration profile: uniform-in-time propagation of chaos for governed jump populations; singularity-aware (local-learning-coefficient) operationalization of the framework's spectral assumptions; a closed-loop information-decay theorem for nonlinear measure flows; the minimum authorized reference-set size for stable adequacy verdicts; and a community-specific representation-loss rate whose constraint set is the governance specification itself, with community consultation a binding precondition of the research. These are offered as formulated problems, with the falsifier discipline extended to the program itself. The framework formalizes a deployed practice: its operational development is published separately as a practitioner volume (Berardi, 2026, ISBN 978-1-966752-16-5), and the direction of derivation — from practice toward its mathematics — is recorded in the monograph's provenance section.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper