Prior art search is the task of identifying earlier patents or publications relevant to a given patent application, and represents a critical step in the patent examination process. While llm have demonstrated remarkable capabilities in solving complex problems across a wide range of domains, their application to patent retrieval remains comparatively underexplored. This thesis investigates the effectiveness of LLMs as rerankers in the patent domain. We propose a two-stage cross-lingual patent retrieval framework, evaluated on the CLEF-IP 2011 benchmark, which comprises approximately 1. 7 million patents. The first stage employs a long-context embedding model to retrieve a set of candidate patents; the second stage then applies a reranker to refine the ordering of these candidates. We first conduct a comprehensive evaluation of recent embedding models on the benchmark dataset, and show that our domain-adapted model, patQwen3-Embedding-4B, outperforms the established patent-specific baselines. Furthermore, we explore different reranking paradigms and observe that listwise reranking with general-purpose LLMs further improves ranking quality over the first-stage retriever. Our findings reveal that both the prompt construction and the reranking algorithm have a substantial influence on performance, with the best results achieved by the tournament and multipass rerankers, which address some of the inherent limitations of the simple approach. Overall, this work demonstrates the potential of neural retrieval pipelines for automated patent search, contributes novel insights into the application of LLM-based rerankers, and lays a foundation for future research in this direction.
Mahmoud Al-Murish (Thu,) studied this question.