What question did this study set out to answer?

To develop and evaluate an on-device data selection mechanism that enhances continual learning in LLMs by prioritizing high-signal samples.

March 22, 2026Open Access

Adaptive Importance-Driven Data Selection for Drift-Aware Continual Personalization of On-Device LLMs

Key Points

To develop and evaluate an on-device data selection mechanism that enhances continual learning in LLMs by prioritizing high-signal samples.
Introduced Adaptive Importance-Driven Selection (AIDS) for data filtering based on composite importance scores.
Implemented drift detection to monitor model capability and trigger maintenance actions as needed.
Conducted a 90-day longitudinal study with 50 participants to assess improvements on user interactions.
AIDS reduced user-text perplexity by 34% compared to the base model.
Health metric H remained ≥0.45 throughout the study period.
Outperformed uniform-replay LoRA on both perplexity and health metrics.

Abstract

We introduce AIDS (Adaptive Importance-Driven Selection), an on-device data selection mechanism that determines which user interactions are worth learning from before a single gradient step is taken. The core insight is that naive continual fine-tuning of on-device LLMs fails not because of insufficient data, but because of indiscriminate data: low-signal samples dominate the training buffer, wasting compute and causing parameter drift that erodes general capabilities. AIDS assigns each incoming sample a composite importance score—combining novelty under the current model, semantic consistency with established user patterns, temporal recency, and signal-source reliability—and admits only high-scoring samples into a fixed-capacity selective buffer. A drift detector monitors general-capability health every k sessions and triggers rollback or magnitude pruning when degradation is detected. On a 90-day longitudinal study with 50 participants, AIDS reduces user-text perplexity by 34% over the base model while maintaining health H≥0.45 throughout, simultaneously outperforming uniform-replay LoRA on both metrics. Index Terms—adaptive data selection, importance scoring, continual learning, LoRA, drift detection, on-device personalization, edge AI.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Vishwajeet shashikant adkine (Thu,) studied this question.

synapsesocial.com/papers/69bf8978f665edcd009e927a https://doi.org/https://doi.org/10.5281/zenodo.19138030

Bookmark

View Full Paper