We introduce Semantic State Abstraction Interfaces (SSAI): a methodological1 template for mapping sparse unstructured text into K auditable, named coordinates2 with neutral defaults on no-news days, designed to separate representation hypothe-3 ses from optimisation variance in sequential decision systems. Our contribution4 is the framework and its evaluation protocol—not a claim that SSAI outperforms5 denser alternatives.6 We instantiate SSAI with K=4 axes (sentiment, risk, confidence, volatility fore-7 cast) on a US-equity panel (30 NASDAQ-100 names, FNSPID news, 2019–20238 test), and evaluate it across three parallel estimators—direct factor portfolios9 (SFP/SRF/SCW), supervised ridge forecasters, and RL agents (DP-PPO, SAC)—10 that share the same fixed φ so that signal and optimiser effects can be read off11 separately.12 What the experiments show. The four-factor SFP reaches 307.2% CR / Sharpe13 1.067 over 2019–2023. However, this apparent advantage over buy-and-hold14 (243.6%) does not survive its own controls: SFP underperforms stratum-matched15 B high-conviction semantic tilts and lexical baselines (VADER, TF–25 IDF/SVD) recover price-only Sharpe. The RL block is a diagnostic: DP-PPO trails26 buy-and-hold; SAC with identical SSAI observations improves Sharpe (1.128 vs.27 1.032), isolating algorithm dependence rather than representation advantage. Seed-28 mean Sharpe differences across 21 DP-PPO seeds are non-significant (Wilcoxon29 p≈0.25–0.31).30 Contribution framing. We present SSAI as an interpretability-performance fron-31 tier instrument and a cautionary diagnostic: it characterises the cost of maintaining32 auditable axes (126pp CR vs. PC1 over five years), shows that RL performance is33 dominated by algorithm choice rather than representation, and provides a reusable34 template for separating semantic signal from optimisation noise in other sparse-text35 decision settings
Yerra et al. (Tue,) studied this question.