Sequentialdecision-making under imperfect information is naturally modeled as an extensive-form game, where the Nash equilibrium serves as the predominant solution concept for two-player zero-sum settings. Counterfactual regret minimization (CFR) is a widely used framework for this purpose, iteratively reducing regret through regret matching so that the average strategy approaches a Nash equilibrium. However, the convergence efficiency of CFR remains a practical challenge. In this work, we refine and reformulate an advantage-based exponential weighting scheme, Exponential CFR (ExpCFR), which accelerates convergence by allocating greater attention to highly profitable actions during the regret-accumulation process. Building on this heuristic, we further introduce Pob-CFR, a framework that integrates population-based evolutionary training with CFR. Pob-CFR maintains a diverse population of heterogeneous CFR variants, periodically evaluating them by exploitability and replacing underperforming individuals with the elite to synchronize strategy exploration. Systematic evaluations across five benchmark games demonstrate that these methods accelerate early-to-mid convergence compared to standard CFR baselines. Furthermore, within the evaluated benchmarks, the relative advantage of the population-based architecture appears more evident in the games with larger strategic complexity.
Zhang et al. (Fri,) studied this question.