What does this research mean for the field?

Semi-definite programming approaches can be used to compute optimal policies and value functions for quantum Markov decision processes (q-MDPs) with both open-loop and classical-state-preserving closed-loop policies. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The research aims to develop semi-definite programming methods for determining optimal policies in quantum Markov decision processes.

February 19, 2026Open Access

Quantum Markov Decision Processes: Dynamic and Semi-Definite Programs for Optimal Solutions

Key Points

The research aims to develop semi-definite programming methods for determining optimal policies in quantum Markov decision processes.
Establishes duality between dynamic programming and semi-definite programming for q-MDPs with open-loop policies.
Computes an approximately optimal value function for open-loop policies.
Formulates the computation of optimal stationary open-loop policies as a bi-linear program.
Establishes dynamic and semi-definite programming formulations for classical-state-preserving closed-loop policies.
Demonstrates a similar method for computing optimal stationary classical-state-preserving closed-loop policies.
Identified that the optimal value function is linear for both policy types.
Proved the existence of a stationary optimal policy among open-loop and closed-loop policies.
Developed new methods to compute optimal value functions in both contexts.

Abstract

Abstract In this paper, building on the formulation of quantum Markov decision processes (q-MDPs) presented in our previous work N. Saldi, S. Sanjari, and S. Yüksel , Quantum Markov Decision Processes: General Theory, Approximations, and Classes of Policies , SIAM Journal on Control and Optimization, 2024, our focus shifts to the development of semi-definite programming approaches for optimal policies and value functions of both open-loop and classical-state-preserving closed-loop policies. First, by using the duality between the dynamic programming and the semi-definite programming formulations of any q-MDP with open-loop policies, we establish that the optimal value function is linear and there exists a stationary optimal policy among open-loop policies. Then, using these results, we establish a method for computing an approximately optimal value function and formulate computation of optimal stationary open-loop policy as a bi-linear program. Next, we turn our attention to classical-state-preserving closed-loop policies. Dynamic programming and semi-definite programming formulations for classical-state-preserving closed-loop policies are established, where duality of these two formulations similarly enables us to prove that the optimal policy is linear and there exists an optimal stationary classical-state-preserving closed-loop policy. Then, similar to the open-loop case, we establish a method for computing the optimal value function and pose computation of optimal stationary classical-state-preserving closed-loop policies as a bi-linear program.

Bookmark

View Full Paper

Bookmark

View Full Paper

Quantum Markov Decision Processes: Dynamic and Semi-Definite Programs for Optimal Solutions

Key Points

Abstract

Cite This Study