What question did this study set out to answer?

To evaluate how well decision reconstruction can be performed from vehicle disengagement records using different narrative formats.

March 8, 2026Open Access

Narrative Cue Preservation and Decision Reconstruction: A Pilot Benchmark on Autonomous Vehicle Disengagement Records

Key Points

To evaluate how well decision reconstruction can be performed from vehicle disengagement records using different narrative formats.
Conducted a pilot experiment with 90 disengagement cases from a larger corpus.
Compared baseline narrative records with structured decision event records (DER).
Employed large language models to reconstruct five decision stages from the records.
Measured exact-agreement accuracy against manually constructed gold labels.
Baseline narrative reconstruction achieved an accuracy of 0.9517.
DER structured reconstruction accuracy was 0.9426, indicating a minor performance difference.
Pilot results suggest that overly structured templates may limit necessary linguistic cues for effective decision reconstruction.

Abstract

This technical note reports a small pilot benchmark examining decision reconstruction from autonomous-vehicle disengagement records. The experiment compares two representations of the same event records: 1. Baseline narrative records2. Structured Decision Event Records (DER) A sample of 90 cases was drawn from a larger AV disengagement corpus. Large language models were asked to reconstruct five decision stages: D1 Trigger D2 Evidence D3 Action D4 Recovery D5 Root cause Exact-agreement accuracy was measured against manually constructed gold labels. Pilot result: Baseline narrative reconstruction accuracy: 0.9517 DER structured reconstruction accuracy: 0.9426 The difference is small but indicates that over-structured templates may dilute linguistic cues required for decision reconstruction. This record is released as a pilot empirical observation and does not claim generalizable performance results. The purpose of this release is to document the benchmark design and provide a minimal empirical reference point for future work on decision transparency interfaces. Within the broader research programme, DER (Decision-Event Record) should be understood as a methodological instrumentation line derived from the TPT / TBT transparency architecture. This record is a scoped empirical note and does not restate the full TPT / TBT theory layer. Further experiments will explore DER variants designed to preserve narrative cues while exposing decision structure. Indexed in the project Root Index ( https://doi.org/10.5281/zenodo.17992916). Author: Hon Bor So Note: The author also publishes under the name Sing So (Chöndrel Dorje). ORCID: 0009-0008-2768-7494

Narrative Cue Preservation and Decision Reconstruction: A Pilot Benchmark on Autonomous Vehicle Disengagement Records

Key Points

Abstract

Cite This Study