Los puntos clave no están disponibles para este artículo en este momento.
In this work, we propose a novel method for training neural networks to perform singledocument extractive summarization without heuristically-generated extractive labels. We call our approach BANDITSUM as it treats extractive summarization as a contextual bandit (CB) problem, where the model receives a document to summarize (the context), and chooses a sequence of sentences to include in the summary (the action). A policy gradient reinforcement learning algorithm is used to train the model to select sequences of sentences that maximize ROUGE score. We perform a series of experiments demonstrating that BANDITSUM is able to achieve ROUGE scores that are better than or comparable to the state-of-the-art for extractive summarization, and converges using significantly fewer update steps than competing approaches. In addition, we show empirically that BANDIT-SUM performs significantly better than competing approaches when good summary sentences appear late in the source document.
Dong et al. (Mon,) studied this question.