What question did this study set out to answer?

This work aims to improve the stability and accuracy of in-context learning using demonstration augmentation.

April 3, 2026

Implicit Demonstration Augmentation for Robust and Stable In-Context Learning

Key Points

This work aims to improve the stability and accuracy of in-context learning using demonstration augmentation.
Proposed implicit demonstration augmentation-based ICL (IDAICL) method to enhance predictions.
Developed domain-aware IDAICL (D-IDAICL) to select relevant domains for test samples during augmentation.
Utilized a hypernetwork to adaptively choose domains based on deep representations of test samples.
Evaluated methods across multiple tasks using eight large language models.
IDAICL and D-IDAICL significantly increase overall and worst-case accuracy of LLMs.
Both methods reduce performance variability across different demonstration setups.
Approaches effectively address issues related to class imbalance and enhance predictive stability.

Abstract

The advent of in-context learning (ICL) allows pretrained large language models (LLMs) to handle unseen inputs by leveraging context, without requiring parameter updates. However, the success of ICL is strongly influenced by factors such as the quality, size, and ordering of demonstrations, often resulting in unstable or less-than-ideal outcomes. This work is the first to address these limitations through the lens of demonstration augmentation. We first propose a simple yet effective ICL method, termed implicit demonstration augmentation-based ICL (IDAICL), that enriches demonstrations by leveraging their deep feature distributions, integrating knowledge from the entire demonstration set to enhance LLM predictions. From a theoretical standpoint, we demonstrate that when the number of augmented samples tends to infinity, our method asymptotically converges to a new form of logit calibration. Building upon this foundation, we further propose a domain-aware IDAICL (D-IDAICL) method, which improves the precision of knowledge integration by identifying and leveraging the most pertinent domain for each test sample during augmentation. Specifically, a hypernetwork is employed to adaptively select the most effective domain based on the deep representation of the test sample. The corresponding domain-specific knowledge is then utilized to augment the demonstrations, resulting in a domain-aware logit calibration function that enhances predictive performance. Comprehensive evaluations across multiple tasks using eight LLMs reveal that both approaches markedly boost overall and worst-case accuracy, leading to improved robustness and predictive capability. In addition, our approaches mitigate performance fluctuations across different demonstrations, orderings, and templates, while also showing effectiveness in handling class imbalance.

Mark Helpful

Bookmark

Relay