Inverse Physics and Generative Representations in Structural Biology Current structural biology operates under a “Big Data” paradigm: the Protein Data Bank exceeds 200,000 structures, totaling terabytes of coordinate files. Yet this representation captures noise alongside signal, treating biological matter as static collections of atoms rather than as outputs of generative processes. We propose that biological structures—and matter more generally—are deterministic outputs of low-entropy generative seeds. If this is true, the inverse problem becomes tractable: given empirical coordinates (possibly noisy), can we recover the underlying generative parameters? We demonstrate an “Inverse Physics” framework that addresses this problem by integrating geometric reconstruction (RMSD minimization) with topological constraints derived from persistent homology. The topological loss function forces the algorithm to preserve fundamental connectivity—holes, tunnels, and voids—before fitting atomic positions. This effectively acts as an infinite-resolution denoiser, discarding measurement noise while retaining structural truth. We validate this framework on two systems: α-helical protein motifs, where we recover Pauling–Corey parameters (radius r = 2.27 Å, pitch = 5.40 Å) from noisy coordinates with RMSD = 0.15 Å, superior to typical X-ray resolution, achieving 28:1 compression. Genus-2 topological manifolds (double torus), where standard algorithms collapse the structure to a sphere, but our persistent homology constraints preserve both holes with parameter recovery error below 0.5%. These results suggest a paradigm shift: from descriptive biology (storing coordinates) to generative biology (storing executable seeds). We discuss implications for semantic structural search, distributed biomanufacturing, and the fundamental nature of biological information.
Andrés Sebastián Pirolo (Fri,) studied this question.