July 28, 2017

Reinforcement Learning to Rank with Markov Decision Process

Key Points

Key points are not available for this paper at this time.

Abstract

One of the central issues in learning to rank for information retrieval is to develop algorithms that construct ranking models by directly optimizing evaluation measures such as normalized discounted cumulative gain~(ND CG). Existing methods usually focus on optimizing a specific evaluation measure calculated at a fixed position, e.g., NDCG calculated at a fixed position K. In information retrieval the evaluation measures, including the widely used NDCG and email protected, are usually designed to evaluate the document ranking at all of the ranking positions, which provide much richer information than only measuring the document ranking at a single position. Thus, it is interesting to ask if we can devise an algorithm that has the ability of leveraging the measures calculated at all of the ranking postilions, for learning a better ranking model. In this paper, we propose a novel learning to rank model on the basis of Markov decision process (MDP), referred to as MDPRank. In the learning phase of MDPRank, the construction of a document ranking is considered as a sequential decision making, each corresponds to an action of selecting a document for the corresponding position. The policy gradient algorithm of REINFORCE is adopted to train the model parameters. The evaluation measures calculated at every ranking positions are utilized as the immediate rewards to the corresponding actions, which guide the learning algorithm to adjust the model parameters so that the measure is optimized. Experimental results on LETOR benchmark datasets showed that MDPRank can outperform the state-of-the-art baselines.

Bookmark

Cite This Study

Wei et al. (Fri,) studied this question.

synapsesocial.com/papers/6a10d3265e6663f9d2647f43 https://doi.org/https://doi.org/10.1145/3077136.3080685

Bookmark