What type of study is this?

This is a Quantitative Study study.

October 20, 2025Open Access

Does Weak-to-strong Generalization Happen under Spurious Correlations?

Key Points

Weak-to-strong generalization occurs with sufficient pseudolabels when group fractions match, enhancing performance.
When the minority group fractions differ, weak-to-strong generalization may fail, particularly as imbalance increases.
Extensive experiments confirm theoretical predictions regarding spurious correlations and generalization effects.
A simple algorithm retrains the strong student on high-confidence data, improving weak-to-strong generalization outcomes.

Abstract

We initiate a unified theoretical and algorithmic study of a key problem in weak-to-strong (W2S) generalization: when fine-tuning a strong pre-trained student with pseudolabels from a weaker teacher on a downstream task with spurious correlations, does W2S happen, and how to improve it upon failures? We consider two sources of spurious correlations caused by group imbalance: (i) a weak teacher fine-tuned on group-imbalanced labeled data with a minority group of fraction η_, and (ii) a group-imbalanced unlabeled set pseudolabeled by the teacher with a minority group of fraction ηᵤ. Theoretically, a precise characterization of W2S gain at the proportional asymptotic limit shows that W2S always happens with sufficient pseudolabels when ηᵤ = η_ but may fail when ηᵤ η_, where W2S gain diminishes as (ηᵤ - η_) ² increases. Our theory is corroborated by extensive experiments on various spurious correlation benchmarks and teacher-student pairs. To boost W2S performance upon failures, we further propose a simple, effective algorithmic remedy that retrains the strong student on its high-confidence data subset after W2S fine-tuning. Our algorithm is group-label-free and achieves consistent, substantial improvements over vanilla W2S fine-tuning.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper