Papers
arxiv:2308.00090

Visual Geo-localization with Self-supervised Representation Learning

Published on Jul 31, 2023
Authors:
,
,

Abstract

Visual Geo-localization (VG) has emerged as a significant research area, aiming to identify geolocation based on visual features. Most VG approaches use learnable feature extractors for representation learning. Recently, Self-Supervised Learning (SSL) methods have also demonstrated comparable performance to supervised methods by using numerous unlabeled images for representation learning. In this work, we present a novel unified VG-SSL framework with the goal to enhance performance and training efficiency on a large VG dataset by SSL methods. Our work incorporates multiple SSL methods tailored for VG: SimCLR, MoCov2, BYOL, SimSiam, Barlow Twins, and VICReg. We systematically analyze the performance of different training strategies and study the optimal parameter settings for the adaptation of SSL methods for the VG task. The results demonstrate that our method, without the significant computation and memory usage associated with Hard Negative Mining (HNM), can match or even surpass the VG performance of the baseline that employs HNM. The code is available at https://github.com/arplaboratory/VG_SSL.

Community

Proposes VG-SSL (Visual Geolocalization with Self-Supervised Learning): tests adaptation of various SSL methods (SimCLR, MoCo v2, BYOL, SimSiam, Barlow Twins, and VICReg); can rasch performance of supervised methods without memory-heavy hard negative mining (HNM) - SSL only requires selecting positive samples. Summary of methods: SimCLR and MoCo use contrastive learning with InfoNCE loss (MoCo has ME - momentum encoder); SimSiam and BYOL are self-distillation with stop-gradient (SG) for target, predictor on target encoder (PR), batch norm (BN) in projector or predictor (BYOL has ME also) with embedding prediction loss; Barlow Twins and VICReg use Information maximization with cross correlation and VIC regularisation losses (contain BN with large dimensional embeddings - LP). Given database and query, get positives and negatives from database, group them (query-positive and identical negative groups), pass through trainable feature (embedding) extractor, use SSL loss. InfoNCE loss (group positives together and push others apart), embedding prediction loss (student has shallow MLP projection and has to match non-trainable/stopgrad teacher), cross-correlation (CC) methods in Barlow Twins loss (make CC matrix with positive pairs and enforce strong correlation with diagonal and -1 correlation for off-diagonal), VICReg loss (invariance, variance, and covariance terms) - don’t (batch) normalize embeddings when computing variance terms (it’ll not give true representations). You might miss information-worthy negative samples in mining, do random sampling of negatives (all without positives for a query) as a database negative ratio (fraction of number of queries sampled per epoch); form query-positive pairs and identical negative pairs for sampling (with same ratio). Uses ResNet-50 as local feature extractor and NetVLAD as global feature aggregator (trying out different SSL losses). MoCov2, BYOL, SimCLR, and BT are better than SimSiam and VICReg. Extended ablations in appendix. From NYU.

Links: PapersWithCode, GitHub

The GitHub link is invalid. Could you update it?

Hey @QianC95 ,
I'm unable to find the code implementation of this paper. Looks like the authors still haven't released the code.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2308.00090 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2308.00090 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2308.00090 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.