We introduce the first RL-based dynamic algorithm configuration (DAC) system for MaxSAT local search. A PPO controller observes NuWLS solver state every 1,000 variable flips and adjusts four clause-weighting parameters in real time. On generated partial MaxSAT benchmarks (3 seeds × 18 test instances), the learned policy achieves −19.0% cost reduction vs. random control (Wilcoxon p = 2.4 × 10⁻⁵) and −10.4% vs. the best hand-tuned static configuration (p = 0.007). The policy discovers an explore-then-exploit noise schedule without explicit curriculum design. Zero-shot transfer to 10× larger instances remains significant (p = 0.004). We identify five structural insights about DAC for local search, including exploration parameter dominance, scale-dependent feature importance, and solver-specific policy non-transferability. All code, benchmarks, and experimental results are included.
Alex Li (Mon,) studied this question.