Ensuring that artificial intelligence can support clinical report generation without compromising patient privacy remains a critical challenge in healthcare AI. To address this issue, we propose a novel framework (PrivLLM-Guard) for differentially private large language models (LLMs) designed for real-time confidential medical text generation and summarization. While LLMs have demonstrated substantial potential in automating clinical documentation, the extreme sensitivity of healthcare data demands rigorous, formally grounded privacy guarantees. Our proposed framework addresses this need by combining advanced (ε,δ)-differential privacy techniques with three key innovations: (i) an adaptive noise calibration system that dynamically adjusts Gaussian noise parameters based on input sensitivity and real-time privacy risk scores, (ii) a hierarchical privacy budget allocation mechanism that assigns differentiated protection levels to medical data categories (e.g., patient identifiers, diagnoses, demographics) using Rényi Differential Privacy (RDP) accounting for tight composition bounds during long-sequence generation, and (iii) an integrated real-time privacy auditing module that continuously monitors information leakage probabilities and triggers adaptive mitigation responses. The framework integrates bidirectional transformer encoders with autoregressive decoders, further enhanced by privacy-aware attention mechanisms with calibrated noise injection (Eq. 5) and gradient perturbation via differentially-private SGD with adaptive clipping (Eqs. 3–4). Extensive experiments on three large-scale medical datasets (MIMIC-III with 2.1 M documents, i2b2 with 858 K records, and a proprietary hospital dataset with 1.5 M records) under strict privacy constraints (ε = 0.1, δ = 10⁻⁶) demonstrate BLEU-4 scores of 89.7% for generation and ROUGE-L scores of 92.3% for summarization, representing a 16.8% improvement over the best baseline. The model processes 512-token sequences in real time with an average latency of 245 ms, a throughput of 19.3 requests per second, and memory usage of just 4.2 GB. Formal computational complexity analysis shows O(n²·d + n·κ) inference cost, where n is sequence length, d is model dimension, and κ represents per-token noise calibration overhead. Compared to state-of-the-art privacy-preserving LLMs, By improving the utility–privacy trade-off by 15.8%, mitigating membership inference risk by 65.9%, and reducing computational overhead by 23.4%, the framework establishes a more secure, efficient, and trustworthy foundation for clinical AI deployment.
Ans D. Alghamdi (Thu,) studied this question.