What question did this study set out to answer?

This research aims to validate substrate identification reproducibility using a two-lane canonicalisation framework across multiple institutions.

May 16, 2026Open Access

The Stationary Sea (Part 2: The Long and Winding Road): Substrate Identification Reproducibility, Two-Lane Canonicalisation, and the Multi-Institution Empirical Validation of the Hundreds-Not-Thousands Counter-Prior

Key Points

This research aims to validate substrate identification reproducibility using a two-lane canonicalisation framework across multiple institutions.
Conducted two variance attribution pilots (pilot v0.1 and pilot v0.2) across four to six institutions.
Measured bridge rate, fragmentation factors, and cumulative cross-cycle coverage across multiple cycles.
Implemented a taxonomic framework to define agents and establish population-scale counts.
Pilot v0.1 achieved a bridge rate of 56.50% across 13 cycles with fragmentation factors between 1.331 and 1.744.
Pilot v0.2 showed a coefficient of variation on bridge rate at approximately 4.2%, below external references.
C_gov heterogeneity found different performance trends across institutions, impacting governance disclosure coherence.

Abstract

This paper reports the multi-institution empirical validation of substrate identification reproducibility for the Meridian Autonomy substrate documented in the companion publication (Collins, 2026e). The validation is built on the two-lane canonicalisation framework and the four-component variance decomposition introduced in Annex 1a (Collins, 2026i). Three threads of evidence are reported, two architectural commitments are articulated, and one forward-looking instrument is proposed. The empirical thread comprises two pre-registered variance attribution pilots: pilot v0. 1 fired 11 May 2026 across four institutions and 13 cycles, in which the pre-registered raw agent-name Jaccard prediction of 0. 90 falsified at observed mean 0. 162 on the six pinned European G-SIB C pairs, the pre-registered halt at Jaccard below 0. 30 was honoured, and three substrate-level instruments (bridge rate, fragmentation collapse, cumulative cross-cycle coverage) were brought into scope as the correct measurement layer; pilot v0. 2 fired 13 to 14 May 2026 across six institutions and four regulatory regimes under deterministic-mode controls with 24 wet-canonicalised cycles. The taxonomic thread establishes the four-level entity taxonomy and the Meridian Autonomy agent definition that together permit population-scale agent counts to be defended against the missing-agents critique and against the natural confusion with use cases or model artefacts. The cross-platform thread documents the Hostinger-to-Hetzner cutover of 7 May 2026 as a methodology event characterised under the four-component variance decomposition. The empirical findings: pilot v0. 1 produced combined bridge rate 56. 50 percent across 13 cycles, fragmentation factor collapse to the 1. 331 to 1. 744 band, and cumulative cross-cycle coverage of the April canonical baseline from 72. 68 percent at European G-SIB C to 80. 38 percent at the UK G-SIB. Pilot v0. 2 produced deterministic-mode within-institution coefficient of variation on bridge rate at approximately 4. 2 percent (cohort mean of five of six pilot institutions), below the lower bound of three external reference bands: FDA bioanalytical guidance, HELM scenario-level standard deviations, and Belz et al. NLP reproducibility literature. The first architectural commitment is the hundreds-not-thousands counter-prior. The substrate's institution-level agent count distribution clusters in the 100 to 300 range for large regulated financial institutions, with no institution crossing 1, 000 agents across nine months of substrate operation. The second architectural commitment is the cross-institutional cgov heterogeneity finding: across the four institutions of pilot v0. 1, cgov change against the April baseline ran in four directions (European G-SIB C flat, European G-SIB D down 10. 54 percent, UK G-SIB down 4. 03 percent, North American G-SIB B up 25. 12 percent). The North American G-SIB B upward offset is composition-driven, resolved against the dropped-anchors view. The cross-institutional heterogeneity is the actual headline finding. The forward-looking instrument is the proposal that canonicalisation metrics may operate as an external instrument for disclosure coherence. Pilot v0. 2's six-institution panel under deterministic-mode controls resolves the pilot v0. 1 single-institution signal into a clean two-cluster partition. Three institutions cluster at high bridge rate and low fragmentation (Swiss G-SIB 68. 93 percent bridge rate, the UK G-SIB 65. 86 percent, Australasian G-SIB 62. 06 percent, with fragmentation factor means below 1. 47), and three at low bridge rate and high fragmentation (European G-SIB D 55. 45 percent, European G-SIB C 51. 81 percent, North American G-SIB B 48. 71 percent, with fragmentation factor means above 1. 59). No cycle of any institution crosses the cluster boundary across the 24 wet cycles. The working hypothesis is that institutions in the high-bridge low-fragmentation cluster have more standardised and more cross-document coherent public AI governance disclosure than institutions in the low-bridge high-fragmentation cluster. The hypothesis is consistent with the cluster pattern at n equals six but not settled at population scale across the 543-institution substrate. The paper reports it as an empirical observation worth further study at population scale, articulates the mechanism, and defers operationalisation explicitly. Appendix A inventories the institutional-grade infrastructure underpinning the substrate, including indicative mapping to DORA, the EU AI Act, and ISO 27001.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

William J. Collins (Thu,) studied this question.

synapsesocial.com/papers/6a080b27a487c87a6a40d485 https://doi.org/https://doi.org/10.5281/zenodo.20185351

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper