NeCo: Improving DINOv2's spatial representations in 19 GPU hours with Patch Neighbor Consistency
Abstract
We propose sorting patch representations across views as a novel self-supervised learning signal to improve pretrained representations. To this end, we introduce NeCo: Patch Neighbor Consistency, a novel training loss that enforces patch-level nearest neighbor consistency across a student and teacher model, relative to reference batches. Our method leverages a differentiable sorting method applied on top of pretrained representations, such as DINOv2-registers to bootstrap the learning signal and further improve upon them. This dense post-pretraining leads to superior performance across various models and datasets, despite requiring only 19 hours on a single GPU. We demonstrate that this method generates high-quality dense feature encoders and establish several new state-of-the-art results: +5.5% and + 6% for non-parametric in-context semantic segmentation on ADE20k and Pascal VOC, and +7.2% and +5.7% for linear segmentation evaluations on COCO-Things and -Stuff.
Community
Improves DINOv2 representation in a self-supervised way
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- SIGMA: Sinkhorn-Guided Masked Video Modeling (2024)
- Unsqueeze [CLS] Bottleneck to Learn Rich Representations (2024)
- Pseudo Labelling for Enhanced Masked Autoencoders (2024)
- Joint-Embedding Predictive Architecture for Self-Supervised Learning of Mask Classification Architecture (2024)
- Efficient Unsupervised Visual Representation Learning with Explicit Cluster Balancing (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper