GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing

Ruizhe Ou¹ · Yuan Hu^2,* · Fan Zhang² · Jiaxin Chen¹ · Yu Liu^2,3

¹Beijing University of Posts and Telecommunications · ²Peking University · ³Peking University Ordos Research Institute of Energy ^*corresponding authors

GeoPix is a new state-of-the-art pixel-level multi-modal large language model in remote sensing domain, supporting referring image segmentation and other tasks.

Release

[2025.02.20] We release the pre-trained checkpoints, inference code and gradio demo! Github
[2025.01.12] We release the Paper.

GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing [Arxiv]

Abstract

In this work, we propose GeoPix, a RS MLLM that extends image understanding capabilities to the pixel level. This is achieved by equipping the MLLM with a mask predictor, which transforms visual features from the vision encoder into masks conditioned on the LLM’s segmentation token embeddings. For more details, please refer to the paper.

Download

You can directly download the model from Huggingface, ModelScope or OpenXLab. You also can download the model in python script:

# Huggingface
from huggingface_hub import snapshot_download
snapshot_download(repo_id="Norman-ou/GeoPix-ft-sior_rsicap", local_dir="./pretrained_models")

# ModelScope
from modelscope import snapshot_download
model_dir = snapshot_download("NormanOU/GeoPix-ft-sior_rsicap", local_dir="./pretrained_models")

Once you have prepared all models, the folder tree should be like:

  .
  ├── ...
  ├── model
  ├── pretrained_models
  ├── app.py
  ├── engine.py
  ├── ...
  └── README.md

Citation

@misc{ou2025geopixmultimodallargelanguage,
      title={GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing}, 
      author={Ruizhe Ou and Yuan Hu and Fan Zhang and Jiaxin Chen and Yu Liu},
      year={2025},
      eprint={2501.06828},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2501.06828}, 
}