Objective Large Language Models (LLMs) have shown exceptional performance in natural language processing, yet their utility in structured clinical data analysis remains relatively underexplored. This pilot study investigates whether LLM-generated embeddings can preserve the structural integrity of clinical datasets and enhance predictive modeling, particularly in resource-constrained settings. Methods We applied dimensionality reduction techniques such as Principal Component Analysis (PCA), t-distributed Stochastic Neighbor Embedding (t-SNE), and k-means clustering to compare original data structures with those derived from LLM embeddings. Evaluation metrics included cosine similarity, area under the curve (AUC), and R 2 , applied across 100 synthetic datasets and two real-world clinical datasets: the UCI medical database and endocarditis patient records. We assessed multiple LLM architectures, including BERT, RoBERTa, Llama 2, and E5-small, focusing on predictive accuracy and computational efficiency. Results LLM embeddings closely mirrored original data structures, with BERT achieving a cosine similarity of 0.95 on linear datasets and Llama 2 (30B) reaching 0.85 on quadratic datasets, albeit with higher computational costs. Predictive performance improved significantly across the board with increases in subject variable ratio (SVR), three groups were identified similar performance, assisted better and assisted significantly better. These groups differed based upon the equation used to generate synthetic data. Discussion These findings highlight the potential of LLMs to enhance structured data analysis by identifying optimal conditions, such as SVR thresholds, for their practical use. The trade-off between computational cost and performance across different LLM architectures is also emphasized, suggesting the need for context-specific model selection. Conclusion LLMs can be effectively leveraged to repurpose existing clinical datasets for individualized clinical questions, such as optimizing surgical timing for patients with infective endocarditis and embolic stroke. This approach advances precision medicine and supports data-driven clinical decision-making.
Building similarity graph...
Analyzing shared references across papers
Loading...
Abbas S. Ali
Subi Gandhi
Syed H. Jafri
Frontiers in Artificial Intelligence
SHILAP Revista de lepidopterología
University of Central Florida
West Virginia University
Tarleton State University
Building similarity graph...
Analyzing shared references across papers
Loading...
Ali et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69b257fc96eeacc4fcec7270 — DOI: https://doi.org/10.3389/frai.2026.1737530
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: