What question did this study set out to answer?

To evaluate how different prompt engineering techniques affect large language models in making decisions about hypertension treatment.

April 17, 2026Open Access

The effects of multitype prompt engineering for large language models in hypertension treatment decisions

Key Points

To evaluate how different prompt engineering techniques affect large language models in making decisions about hypertension treatment.
Conducted a two-stage validation study with 300 de-identified simulated hypertension cases.
Evaluated performance of ChatGPT-4.1 and DeepSeek-V3 with various prompt types.
Measured accuracy and inappropriate regimen rates across different hospital types.
ChatGPT-4.1 with optimal prompts achieved 91.3% accuracy, nearing expert levels.
Zero-shot prompting led to the lowest accuracy at 62.7%.
Physicians' accuracy improved significantly with optimal LLM assistance, with increases from 73.4% to 82.5%, etc.
Inappropriate regimen rates rose from 26.6% to 35.2% under poor LLM configurations.

Abstract

The effects of various prompt engineering on Large Language Models (LLMs) performance in hypertension decision-making are not yet fully understood. We evaluate the impact of different prompt engineering on LLM performance in hypertension treatment decision-making. We conducted a two-stage validation study using 300 de-identified simulated hypertension cases based on real-world clinical scenarios. ChatGPT-4.1 with Guidance-Self-Consistency achieved optimal performance (91.3% accuracy), nearing expert-level competency, while zero-shot prompting yielded worst results (62.7% with DeepSeek-V3). Optimal LLM assistance consistently enhanced physicians’ average accuracy across all levels (community hospital: 73.4% to 82.5%; county hospital: 84.0% to 87.9%; teaching hospital: 91.5% to 92.0%) and reduced inappropriate regimen rates. The worst LLM configurations decreased physician performance below baseline, increasing inappropriate regimen rates from 26.6% to 35.2% across all levels. Effectively designed prompt strategies enable LLMs to provide reliable hypertension treatment recommendations, thereby supporting physicians’ clinical decisions. This study has been trial-registered (ChiCTR2500099307, March 21, 2025).

Bookmark

View Full Paper

Cite This Study

Li et al. (Wed,) studied this question.

synapsesocial.com/papers/69e1cd6f5cdc762e9d856f5c https://doi.org/https://doi.org/10.1038/s41746-026-02645-y

Bookmark

View Full Paper