Key points are not available for this paper at this time.
Recent self-supervised methods for image representation learning are based on the agreement between embedding vectors from different views of the image. A trivial solution is obtained when the encoder outputs constant. This collapse problem is often avoided through implicit biases in the architecture, that often lack a clear justification or interpretation. this paper, we introduce VICReg (Variance-Invariance-Covariance), a method that explicitly avoids the collapse problem with a regularization term on the variance of the embeddings along each individually. VICReg combines the variance term with a decorrelation based on redundancy reduction and covariance regularization, and results on par with the state of the art on several downstream tasks. addition, we show that incorporating our new variance term into other helps stabilize the training and leads to performance improvements.
Bardes et al. (Tue,) studied this question.