What question did this study set out to answer?

January 18, 2026Open Access

Upsampling DINOv2 Features for Unsupervised Vision Tasks and Weakly Supervised Materials Segmentation

Key Points

To explore the effectiveness of upsampling DINOv2 features for unsupervised vision tasks and weakly supervised materials segmentation.
Combined upsampled features with a clustering-based approach for object localization and segmentation.
Paired upsampled features with standard classifiers for weakly supervised materials segmentation.
Evaluated performance on various benchmarks.
Achieved impressive baselines without finetuning additional networks.
Showed strong performance particularly in weakly supervised segmentation tasks.
ViT features captured complex relationships that traditional methods could not.

Abstract

The features of self‐supervised vision transformers (ViTs) contain strong semantic and positional information relevant to downstream tasks like object localization and segmentation. Recent works combine these features with traditional methods like clustering, graph partitioning or region correlations to achieve impressive baselines without finetuning or training additional networks. Upsampled features are leveraged from ViT networks (e.g., DINOv2) in two workflows: in a clustering‐based approach for object localization and segmentation and paired with standard classifiers in weakly supervised materials segmentation. Both show strong performance on benchmarks, especially in weakly supervised segmentation where the ViT features capture complex relationships inaccessible to classical approaches. It is expected that the flexibility and generalizability of these features will both speed up and strengthen materials characterization, from segmentation to property‐prediction.

Upsampling DINOv2 Features for Unsupervised Vision Tasks and Weakly Supervised Materials Segmentation

Key Points

Abstract

Cite This Study