We compare classical and modern regression models for next-day cryptocurrency forecasting on 14 USD-denominated coins across three liquidity tiers from 2018 through 2025, and we use the resulting panel to formally test three pre-specified hypotheses. The features are a strictly past-only 28-element set; the evaluation uses expanding-window walk-forward cross-validation with nested hyperparameter tuning, stationary block-bootstrap 95% confidence intervals, and pairwise Diebold–Mariano tests. Methodologically, we derive a bias-variance bound that turns the ‘no model beats the mean’ observation from a null finding into a predicted outcome under weak-form market efficiency. Empirically, (H1) the threshold–effect interaction is not supported (slope −1. 7 × 10−4, 95% CI −4. 8 × 10−4, +1. 4 × 10−4, p = 0. 25). (H2) Statistical loss minimisation is decoupled from risk-adjusted economic outcome: the cluster-bootstrapped 95% CI for the Spearman rank correlation between the within-ticker MAE rank and within-ticker post-cost Sharpe rank is −0. 39, +0. 10 overall, lies *strictly below zero* on the mid-cap (CI −0. 71, −0. 04) and long-tail (CI −0. 26, −0. 09) tiers, and decisively rejects perfect alignment (ρ = +1) on every tier. None of the seven (ticker, model) pairs with annualised Sharpe ≥ 0. 5 has a hit rate significantly different from 0. 5; high-Sharpe outcomes reflect return skew, not directional skill—formally predicted by a closed-form Sharpe–MSE decoupling proposition we derive in Section 3. 6 under non-zero return skewness. (H3) Lo–MacKinlay variance ratio tests show top-tier coins are indistinguishable from a random walk (|z| ≤ 1. 5 at q ∈ 2, 5, 10), while mid- and long-tail tiers reject the random-walk null at q = 2 (z = −2. 36, z = −2. 60). The findings extend across two robustness layers. An AR (1) -GARCH (1, 1) baseline produces R2 ≈ −0. 005 on every tier and is indistinguishable from Lasso, supporting the bias-variance bound; Giacomini–White conditional predictive ability tests reject equal predictive ability between Lasso and tree-based models on every coin in every tier, complicating naive DM interpretations; and a forward-walking 2026-Q1 holdout—83 daily observations per coin entirely outside the training window—confirms that H1 is even more decisively null on unseen data and that the H3 efficiency conclusion holds. Together, these results give a formally tested EMH-style picture for daily crypto: no model meaningfully forecasts log-returns; statistical accuracy and trading P and weak-form efficiency is approximately satisfied in most liquid coins and in the convergence across the cross-section.
Vasileva et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: