We introduce Bisagra Geometric Attention, a sparse attention mechanism derived from a single mathematical property of ordered sequences: the central element of any odd-length sequence N is its hinge (bisagra), and this property nests concentrically, bisagra(N) ⊃ bisagra(N−2) ⊃ bisagra(N−4), forming a causal chain knowable without observing future tokens. Translated into attention weights as accumulated sums w(i,j) = Σ k/(k+1) · 1|i-j| ≤ k, this produces attention with O(T·K·D) complexity where K = 17 is constant, smooth distance decay activating all K neighbors with attention entropy 2.69, and zero transcendental operations. When implemented jointly at GPU hardware level (INT8 SIMD dp4a kernels on Ampere/Ada/Hopper architectures) and at model architecture level (sparse attention substituting softmax), the efficiency gains compound multiplicatively rather than additively: 4× from INT8 throughput, 60× from the T² → T·K operation reduction at T=1024, and 4× from eliminating exp(·) entirely, yielding ~964× theoretical and 64-128× measured real-world throughput improvement. Empirical validation on calibrated financial time series (Sharpe 1.457 across three markets, walk-forward methodology), blockchain difficulty adjustment (5× robustness to timing attacks), and vector graphics (3.85× effective VRAM expansion) demonstrates that the principle generalizes across positionally-structured systems. The mechanism is interpretable by construction: the attention pattern is precomputable per sequence length and auditable without forward pass. The paper includes an epistemological discussion of Aristotle's wheel paradox as a category error resolved by the bisagra framework, demonstrating that the principle captures a structural property of positionally-organized data rather than an attention-specific optimization.
Renny Rainerd León Sanchez (Sat,) studied this question.