June 6, 2022Open Access

A Neural Corpus Indexer for Document Retrieval

Key Points

Key points are not available for this paper at this time.

Abstract

Current state-of-the-art document retrieval solutions mainly follow an index-retrieve paradigm, where the index is hard to be directly optimized for the final retrieval target. In this paper, we aim to show that an end-to-end deep neural network unifying training and indexing stages can significantly improve the recall performance of traditional methods. To this end, we propose Neural Corpus Indexer (NCI), a sequence-to-sequence network that generates relevant document identifiers directly for a designated query. To optimize the recall performance of NCI, we invent a prefix-aware weight-adaptive decoder architecture, and leverage tailored techniques including query generation, semantic document identifiers, and consistency-based regularization. Empirical studies demonstrated the superiority of NCI on two commonly used academic benchmarks, achieving +21.4% and +16.8% relative enhancement for Recall@1 on NQ320k dataset and R-Precision on TriviaQA dataset, respectively, compared to the best baseline method.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yujing Wang

Harbin University of Science and Technology

Yingyan Hou

Chinese Academy of Sciences

Haonan Wang

Qingdao University

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A Neural Corpus Indexer for Document Retrieval

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study