Accurate prediction of cardiovascular disease (CVD) risk enables earlier prevention and better clinical decisions. Conventional models such as the Framingham Risk Score (FRS) and Atherosclerotic Cardiovascular Disease (ASCVD) equations may generalize poorly across diverse populations and incomplete electronic health records (EHRs). In this paper, we present a prompting-based alternative that uses few-shot in-context learning to guide large language models (LLMs) in estimating 10-year CVD risk without retraining, offering a data-efficient and privacy-conscious alternative to fine-tuned medical LLM pipelines. Using 352 de-identified MIMIC-III/IV records, we evaluate GPT-4.1, GPT-4o, and Qwen3-4B against FRS and ASCVD outputs under zero-shot and few-shot prompting, random versus similarity-based exemplar selection, and with or without chain-of-thought reasoning. Few-shot prompting substantially improves calculator alignment for GPT-4.1 and GPT-4o, whereas Qwen3-4B shows weaker gains. With 40 examples and reasoning enabled, GPT-4.1 achieves AUPRC 0.951, mean absolute error about 7, root mean squared error about 9, and F1-score 0.85, while GPT-4o performs comparably. Within the white-cohort similarity analysis, five similarity-selected exemplars match or outperform 20 randomly selected examples across error and discrimination metrics, showing that exemplar quality can outweigh quantity under tight context budgets. Overall, these findings indicate that few-shot prompting can closely approximate validated clinical calculators in data-limited settings and can be adapted across institutions and patient populations through exemplar selection rather than retraining. However, clinical utility remains bounded by the strengths and weaknesses of the underlying calculators, and we do not evaluate prediction of observed cardiovascular events.
Berhe et al. (Wed,) studied this question.