What question did this study set out to answer?

The aim is to optimize hyperparameters for neural networks efficiently using a dynamic programming framework.

June 21, 2026Open Access

DP-HPO: Approximate Dynamic Programming for Neural Network Hyperparameter Optimisation with Evaluation Caching

Key Points

The aim is to optimize hyperparameters for neural networks efficiently using a dynamic programming framework.
Formulated hyperparameter optimisation as a finite-horizon Markov Decision Process (MDP).
Implemented approximate dynamic programming and evaluation caching to reduce redundant model training.
Benchmarking against multiple baselines across four datasets with Wilcoxon signed-rank tests.
DP-HPO reduced model evaluations by 90.7%, requiring only 10 evaluations for a 4-dimensional MLP space compared to 108 for exhaustive grid search.
Achieved performance within 0.5% of exhaustive grid search on all datasets, demonstrating effectiveness.
Optimality gap bound derived when independence among hyperparameter dimensions is violated.

Abstract

Abstract—Neural network performance is highly sensitive to hyperparameter settings, yet exhaustive search over the configuration space is computationally prohibitive. We present DP-HPO, a framework that formulates hyperparameter optimisation (HPO) as a finite-horizon Markov Decision Process (MDP) and solves it via approximate dynamic programming (ADP). Under the conditional independence of hyperparameter dimensions—empirically satisfied on standard MLP search spaces—DP-HPO implements exact dynamic programming, committing each dimension optimally via Bellman's backward induction. When independence is violated, we derive an optimality gap bound of f* − fᴅᴘ-ᴴᴺᴼ ≤ (d−1)·ε, where ε is the maximum pairwise interaction strength and d is the number of hyperparameter dimensions (Theorem 1). An evaluation cache eliminates redundant model training, yielding exactly 10 evaluations for a standard 4-dimensional MLP space versus 108 for exhaustive grid search—a 90.7% reduction. We benchmark DP-HPO against eight baselines (Grid Search, Random Search at two budgets, Bayesian Optimisation, Optuna/TPE, Hyperband, BOHB, and SMAC) across four datasets with 25 independent seeds and Wilcoxon signed-rank tests with Bonferroni correction. Results demonstrate that DP-HPO achieves competitive performance within 0.5% of exhaustive grid search across all four datasets while reducing the number of model evaluations by 90.7%.Index Terms—Hyperparameter optimisation, dynamic programming, Markov decision process, neural network, approximate dynamic programming, evaluation caching, Bayesian optimisation.

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper