Real-time tumor tracking (RTTT) is a key component in proton therapy, enabling accurate tumor localization for an effective and risk-mitigated dose delivery. Current RTTT methods rely on supervised machine learning models, either multi-patient (MP) models trained on a cohort of patients before their deployment on incoming patients, or patient-specific (PS) models trained pre-intervention on the patient's data collected during planning session. However, with a continuous flow of new patients, both training approaches struggle in clinical workflows: MP models fail to adapt to each patient's unique anatomical characteristics and breathing patterns, while PS models can be computationally expensive. Domain adaptation through fine-tuning alleviates these issues by leveraging a curated source model and adapting at a lower data and computational cost to a new target domain (i.e., new patient). In this work, we consider the fine-tuning of vision transformers (ViT) in RTTT methods with Low-Rank Adaptation (LoRA). Our study on 32 patients also focuses on which source models to leverage between MP or PS. We show that (i) fine-tuning consistently improves performance between 35\% and 68\% in comparison with non-adapted models, even after one-shot adaptation and (ii) MP models are more effective source models than pre-trained PS models.
Hertaing et al. (Wed,) studied this question.