Early stopping has been widely used to regularize models and can reduce the amount of computation by halting the training process when the performance of the model on a validation set stops improving. However, conventional early stopping applies the same stopping criterion to all instances without considering their individual learning status, which can leads to both potential overfitting and redundant computational costs on instances that are already well learned. To further improve efficiency, we propose Instance-dependent Early Stopping (IES), which adapts the early stopping mechanism from the entire training set to the instance level, based on the core principle that once the model has mastered an instance, the training on it should stop. IES considers an instance mastered if the second-order differences of its loss value remain within a small range around zero. This provides a uniform stopping criterion that is applicable across all instances, unlike a simple loss value threshold which is affected by sample difficulty. We show that excluding mastered instances from backpropagation can increase gradient norms, thereby accelerating the decrease in the training loss and speeding up the training process. To address the remaining overhead in forward propagation, we introduce an enhanced variant, IES+, designed for aggressive training acceleration. The foundational IES accelerates training, reducing the number of instances receiving backpropagation by 10%-50% while maintaining or even improving performance. For scenarios where speed is the top priority, IES+ further optimizes the forward pass to achieve state-of-the-art reductions in wall-clock time. Furthermore, we extend our evaluation to validate the effectiveness of IES for the supervised fine-tuning of large language models, where it achieves notable computational savings while preserving or improving performance.
Yuan et al. (Thu,) studied this question.