This working paper studies whether deep learning models can improve out-of-sample volatility forecasting in the cryptocurrency market when compared with strong econometric benchmarks. Using Bitcoin as the primary case, we construct realized volatility measures and evaluate multi-step-ahead forecasts (e.g., one-week-ahead) under a rolling/expanding evaluation scheme. We compare a deep learning baseline (LSTM) against standard volatility forecasting models, including GARCH(1,1) and the HAR/HAR-X family. To test the incremental predictive value of nonlinear market information, we augment the feature set with commonly used technical and market indicators (e.g., MACD and liquidity/volume-related proxies) and conduct ablation-style comparisons. Forecast accuracy is assessed using multiple loss functions/metrics (e.g., RMSE/MAE and QLIKE) to reduce metric dependence. Across the evaluated settings, the econometric benchmark (HAR/HAR-X) is generally more stable and competitive out of sample, while the LSTM does not deliver robust improvements when trained on the same information set. These results highlight the importance of strong baselines, careful evaluation design, and the limits of model complexity in volatility prediction for this setting. Status: Working paper (not peer-reviewed).Keywords: volatility forecasting; cryptocurrency; realized volatility; HAR; GARCH; LSTM; time series; out-of-sample evaluation.
HAO YUN (Sun,) studied this question.