What question did this study set out to answer?

The study aims to evaluate a hybrid retrieval framework for scientific literature that combines various retrieval strategies and assesses their effectiveness.

May 16, 2026Open Access

A Multi-Stage Hybrid Retrieval Framework for the Scientific Literature with Cross-Encoder Re-Ranking

Key Points

The study aims to evaluate a hybrid retrieval framework for scientific literature that combines various retrieval strategies and assesses their effectiveness.
Conducted experiments using the SciFact benchmark to compare multi-stage retrieval components.
Analyzed performance metrics including NDCG, MAP, Recall, and MRR across different configurations.
Performed cross-domain evaluations to assess stability of retrieval paradigms on varying task difficulties.
Achieved NDCG@10 of 0.523 and MAP@10 of 0.479 with the hybrid configuration (SciNCL + BM25 + Cross-Encoder).
Identified that lexical pseudo-relevance feedback introduces query drift, negatively impacting claim-focused retrieval.
Found that the RRF dilution effect worsens on more challenging retrieval tasks, emphasizing the need for integrated systems.

Abstract

Effective scientific literature retrieval requires moving beyond surface-level term matching toward structured semantic reasoning. This paper presents a controlled empirical study of multi-stage retrieval for scientific literature, integrating lexical matching, dense semantic modeling, hybrid fusion, and cross-encoder re-ranking within a unified evaluation framework. The study is designed to analyze the interactions, trade-offs, and failure modes of these components in claim-based scientific search. Experiments on the SciFact benchmark demonstrate that dense models capture semantic similarity but remain insufficient when used in isolation. Hybrid fusion broadens the candidate pool but does not consistently outperform the best standalone dense retriever, as RRF-based fusion can dilute strong dense rankings when lexical and semantic signals diverge. Cross-encoder re-ranking proves to be the primary driver of final performance gains, with the best configuration, Hybrid (SciNCL + BM25) + Cross-Encoder, reaching NDCG@10 of 0.523, MAP@10 of 0.479, Recall@10 of 0.642, and MRR@10 of 0.497. Ablation analysis shows that lexical pseudo-relevance feedback (RM3) introduces query drift in claim-focused retrieval, and that passage-level max pooling weakens effectiveness by fragmenting document-level evidence. Cross-domain evaluation on SciFact, PubMedQA, and SciDocs demonstrates that the relative ranking of retrieval paradigms remains stable across datasets with varying difficulty levels, while also revealing that the RRF dilution effect intensifies on harder retrieval tasks. These findings suggest that effective scientific retrieval benefits from integrated multi-stage pipelines, and that understanding component-level interactions is essential for designing robust retrieval systems.

A Multi-Stage Hybrid Retrieval Framework for the Scientific Literature with Cross-Encoder Re-Ranking

Key Points

Abstract

Cite This Study