What question did this study set out to answer?

This research aims to develop a zero-shot framework for ranking large language models (LLMs) based on prompt suitability.

February 21, 2026Open Access

A zero-shot neural learning to rank framework for ranking large language models

Key Points

This research aims to develop a zero-shot framework for ranking large language models (LLMs) based on prompt suitability.
Developed a zero-shot ranking framework without executing candidate models.
Integrated prompt-aware, cluster-aware, and LLM metadata-aware embeddings in a neural architecture.
Utilized data from the TREC Million LLM Track with 14,950 prompts and 1130 LLMs.
Achieved an nDCG@10 score of 0.3451 and MRR of 0.2550, a 38% improvement over single-feature baselines.
Demonstrated varying ranking effectiveness based on prompt type, length, and search intent.
Showed that fusing heterogeneous features enhances zero-shot LLM selection, reducing computational costs.

Abstract

Large Language Models (LLMs) differ widely in their performance across tasks, making efficient model selection essential for reliable and cost-effective deployment. This paper proposes a zero-shot LLM ranking framework that predicts the most suitable model for a given prompt without executing any candidate models. Using data from the TREC Million LLM Track, which includes 14,950 prompts evaluated across 1130 LLMs, the framework integrates prompt-aware, cluster-aware, and LLM metadata-aware embeddings within an end-to-end neural architecture. The proposed model achieved an nDCG@10 of 0.3451 and an MRR of 0.2550, representing a 38% improvement over single-feature baselines. Analysis across 2,990 test prompts showed that ranking effectiveness varies with prompt type, length and prompt search intent. The results demonstrate that fusing heterogeneous features enables accurate zero-shot LLM selection while significantly reducing computational cost. This work provides a scalable and energy-efficient alternative to brute-force evaluation and establishes a foundation for adaptive, prompt-aware routing in multi-LLM systems.

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper