Abstract We consider a two-person zero-sum discounted stochastic game with random rewards and known transition probabilities. The players have opposite objectives and are interested in optimizing the expected discounted reward which they can obtain with a given confidence level when both the players play the worst possible move against each other. We model such a game problem by defining the chance-constrained optimization problem for each player. In this framework, risk attitudes are determined by confidence levels, where values of 0.5 or below correspond to risk-seeking behavior and values above 0.5 correspond to risk-averse behavior. When the reward vector follows a multivariate elliptically symmetric distribution, the game is equivalent to a minimax formulation. We consider the game with risk-seeking and risk-averse players separately. We show that the risk-seeking problem is equivalent to a constrained optimization of a parameterized zero-sum stochastic game and the optimal payoff of player 1 and optimal cost of player 2 can be computed using Riemann gradient sampling algorithms. Later we use the solution of the constrained optimization problem of each player to compute its optimal strategy by solving a linear programming problem. We reformulate the risk-averse problem as a discrete minimax problem. We propose an algorithm based on a linearization method and discuss its convergence properties. Alternatively, we reformulate the risk-averse problem as a second-order cone programming problem with bilinear constraints. The numerical experiments on randomly generated instances are performed to illustrate our theoretical results.
Osmani et al. (Sat,) studied this question.