Procurement of training data for AI systems in regulated industries (financial services, healthcare, legal) currently lacks an independent quality measurement that satisfies model-risk audit requirements such as SR 11-7, EU AI Act Article 10, FDA 21 CFR 11.10(e), and HHS §1557. We introduce LQS v3.1, a 19-dimension quality standard for tabular, text, and image datasets that addresses three documented weaknesses of existing single-model quality scores: (1) reference-model bias via a 7-oracle consensus across 5 algorithm families with cross-validated agreement reporting (Cohen and Fleiss κ); (2) brittleness of metadata-derived task inference via a data-driven task detection layer with explicit ambiguity flagging; (3) over-confidence of point estimates via Wilson binomial intervals on rate-based dimensions, pooled-fold standard deviation on oracle-derived dimensions, and bootstrap-derived intervals on the composite. We add inductive split-conformal prediction (Vovk 2005, Romano 2019) producing 90% prediction intervals on downstream macro-F1 with provable coverage guarantees, and a graded benchmark-contamination dimension covering 40+ public evaluation suites (MMLU, HumanEval, GSM8K, SQuAD, etc.). Every score is bound to a canonical-JSON-serialized payload and signed with an Ed25519 keypair, producing a cryptographically verifiable certificate auditable offline against a published public key. We provide a reference implementation, a public verification API, and an SDK with no-auth verification helpers. The full LQS v3.1 specification is presented as a candidate reference methodology for ongoing standards work in IEEE P2841, NIST AI RMF, and ISO/IEC JTC 1 SC 42.
Alex Adrion (Tue,) studied this question.