What question did this study set out to answer?

The aim is to achieve last-iterate convergence in zero-sum matrix games with a focus on minimax policies.

April 20, 2026Open Access

Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier

Key Points

The aim is to achieve last-iterate convergence in zero-sum matrix games with a focus on minimax policies.
Utilized log-barrier regularization for policy learning.
Performed dual-focused analysis to achieve theoretical convergence rates.
Studied both matrix games and extended this to extensive-form games.
Achieved O-tilde(t^{-1/4}) convergence with high probability.
Proved a lower bound on the exploitability gap.
Extended methods to cover extensive-form games with similar convergence rates.

Abstract

We study the problem of learning minimax policies in zero-sum matrix games. Fiegel et al. (2025) recently showed that achieving last-iterate convergence in this setting is harder when the players are uncoupled, by proving a lower bound on the exploitability gap of Omega (t^-1/4). Some online mirror descent algorithms were proposed in the literature for this problem, but none have truly attained this rate yet. We show that the use of a log-barrier regularization, along with a dual-focused analysis, allows this O-tilde (t^-1/4) convergence with high-probability. We additionally extend our idea to the setting of extensive-form games, proving a bound with the same rate.

Bookmark

View Full Paper

Bookmark

View Full Paper

Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier

Key Points

Abstract

Cite This Study