Key points are not available for this paper at this time.
A key limitation in current datasets for multi-hop reasoning is that the steps for answering the question are mentioned in it explicitly. In work, we introduce StrategyQA, a question answering (QA) benchmark where required reasoning steps are implicit in the question, and should be using a strategy. A fundamental challenge in this setup is how to such creative questions from crowdsourcing workers, while covering a range of potential strategies. We propose a data collection procedure combines term-based priming to inspire annotators, careful control over annotator population, and adversarial filtering for eliminating reasoning. Moreover, we annotate each question with (1) a decomposition into steps for answering it, and (2) Wikipedia paragraphs that contain the to each step. Overall, StrategyQA includes 2, 780 examples, each of a strategy question, its decomposition, and evidence paragraphs. shows that questions in StrategyQA are short, topic-diverse, and cover wide range of strategies. Empirically, we show that humans perform well (87%) this task, while our best baseline reaches an accuracy of \66%.
Geva et al. (Wed,) studied this question.