What question did this study set out to answer?

The study aims to determine optimal strategies in zero-sum stochastic games under different risk attitudes.

January 26, 2026Open Access

Zero-sum discounted stochastic games with random rewards

Key Points

The study aims to determine optimal strategies in zero-sum stochastic games under different risk attitudes.
Defined chance-constrained optimization problems for risk-seeking and risk-averse players.
Utilized Riemann gradient sampling algorithms to compute optimal payoffs and costs.
Reformulated risk-averse problems into discrete minimax and second-order cone programming frameworks.
Conducted numerical experiments on randomly generated game instances.
Established equivalence between risk-seeking and constrained optimization in zero-sum games.
Demonstrated that linear programming can compute optimal strategies effectively.
Validated convergence properties of the proposed linearization method for risk-averse problems.

Abstract

Abstract We consider a two-person zero-sum discounted stochastic game with random rewards and known transition probabilities. The players have opposite objectives and are interested in optimizing the expected discounted reward which they can obtain with a given confidence level when both the players play the worst possible move against each other. We model such a game problem by defining the chance-constrained optimization problem for each player. In this framework, risk attitudes are determined by confidence levels, where values of 0.5 or below correspond to risk-seeking behavior and values above 0.5 correspond to risk-averse behavior. When the reward vector follows a multivariate elliptically symmetric distribution, the game is equivalent to a minimax formulation. We consider the game with risk-seeking and risk-averse players separately. We show that the risk-seeking problem is equivalent to a constrained optimization of a parameterized zero-sum stochastic game and the optimal payoff of player 1 and optimal cost of player 2 can be computed using Riemann gradient sampling algorithms. Later we use the solution of the constrained optimization problem of each player to compute its optimal strategy by solving a linear programming problem. We reformulate the risk-averse problem as a discrete minimax problem. We propose an algorithm based on a linearization method and discuss its convergence properties. Alternatively, we reformulate the risk-averse problem as a second-order cone programming problem with bilinear constraints. The numerical experiments on randomly generated instances are performed to illustrate our theoretical results.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper