What type of study is this?

This is a Quantitative Study study (also classified as: Experimental Study).

October 20, 2025Open Access

AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play

Key Points

AceSearcher outperforms state-of-the-art models, achieving an average exact match improvement of 7.6%.
The model utilizes self-play through diverse tasks to significantly enhance reasoning capabilities.
In document-level finance reasoning, it matches DeepSeek-V3's performance with only 5% of its parameters.
At smaller scales, AceSearcher surpasses models with up to 9x more parameters, showing remarkable efficiency.

Abstract

Search-augmented LLMs often struggle with complex reasoning tasks due to ineffective multi-hop retrieval and limited reasoning ability. We propose AceSearcher, a cooperative self-play framework that trains a single large language model (LLM) to alternate between two roles: a decomposer that breaks down complex queries and a solver that integrates retrieved contexts for answer generation. AceSearcher couples supervised fine-tuning on a diverse mixture of search, reasoning, and decomposition tasks with reinforcement fine-tuning optimized for final answer accuracy, eliminating the need for intermediate annotations. Extensive experiments on three reasoning-intensive tasks across 10 datasets show that AceSearcher outperforms state-of-the-art baselines, achieving an average exact match improvement of 7.6%. Remarkably, on document-level finance reasoning tasks, AceSearcher-32B matches the performance of the DeepSeek-V3 model using less than 5% of its parameters. Even at smaller scales (1.5B and 8B), AceSearcher often surpasses existing search-augmented LLMs with up to 9x more parameters, highlighting its exceptional efficiency and effectiveness in tackling complex reasoning tasks. Our code will be published at https://github.com/ritaranx/AceSearcher and https://huggingface.co/AceSearcher.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Ran Xu

Yuchen Zhuang

Zihan Dong

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study