What question did this study set out to answer?

The research aims to evaluate the accuracy of descriptions of the Hornblower sign and its differentiation from the Patte test.

March 7, 2026Open Access

Descriptive Heterogeneity of the Hornblower Sign Across Scientific Literature, Search Engines, and Large Language Models

Key Points

The research aims to evaluate the accuracy of descriptions of the Hornblower sign and its differentiation from the Patte test.
Analyzed primary publications referring to the original Hornblower sign description by Arthuis.
Reviewed the first 35 Google search results for accuracy.
Evaluated responses from five large language models (LLMs) on the Hornblower sign.
Fourteen original publications had an average score of 2.07 for accuracy.
50% of original descriptions accurately defined the Hornblower sign, while 43% conflated it with the Patte test.
Among 34 Google search results, the average accuracy score was 1.17, with 77% scoring ≤ 1 point.
The five LLMs achieved a mean accuracy score of 1.8, showing significant variability.

Abstract

Background/Objectives: Digitalization of medical knowledge has improved access to information but also increased the spread of imprecise content. Repeated exposure to incorrect descriptions may lead to their normalization over time. This is particularly evident for the Hornblower sign, which is frequently conflated with the Patte test in the literature, online sources, and large language model outputs. This study systematically evaluates these descriptions and quantifies related inaccuracies. Methods: A three-step approach was applied to answer the question “What is the Hornblower sign?”. First, primary publications referring to the original description by Arthuis were analyzed. Second, the first 35 Google search results were systematically reviewed. Third, responses from five widely used LLMs (ChatGPT 5.1, Grok 4.1, Gemini 3 Pro, Perplexity, and DeepSeek-R1) were evaluated. All descriptions were assessed using a standardized 4-point scoring system (0–3 points) capturing content accuracy and correct differentiation between the Hornblower sign and the Patte test. Results: Fourteen original publications were included, yielding a mean score of 2.07. Correct descriptions were found in 50%, while 43% described only the Patte test. Among 34 evaluable Google search results, the mean score was 1.17, with 77% scoring ≤ 1 point. The five LLMs achieved a mean score of 1.8, demonstrating substantial variability and overall incomplete conceptual accuracy. Conclusions: Descriptions of the Hornblower sign show substantial heterogeneity and frequent inaccuracies across the scientific literature, online sources, and LLM outputs. Conflation with the Patte test undermines diagnostic reliability and limits study comparability. Critical source appraisal and adherence to original test descriptions are essential to maintain clinical and scientific rigor.

Descriptive Heterogeneity of the Hornblower Sign Across Scientific Literature, Search Engines, and Large Language Models

Key Points

Abstract

Cite This Study