metadata

license: apache-2.0
tags:
  - segmentation
  - remove background
  - background
  - background-removal
  - Pytorch
pretty_name: Open Remove Background Model
datasets:
  - schirrmacher/humans

Open Remove Background Model (ormbg)

>>> DEMO <<<

This model is a fully open-source background remover optimized for images with humans. It is based on Highly Accurate Dichotomous Image Segmentation research.

This model is similar to RMBG-1.4, but with open training data/process and commercially free to use.

Inference

python utils/inference.py

Training

The model was trained with the Human Segmentation Dataset.

After 10.000 iterations with a single NVIDIA GeForce RTX 4090 the following achievements were made:

Training Time: 8 hours
Training Loss: 0.1179
Validation Loss: 0.1284
Maximum F1 Score: 0.9928
Mean Absolute Error: 0.005

Output model: /models/ormbg.pth.

Want to train your own model?

Checkout Highly Accurate Dichotomous Image Segmentation code:

git clone https://github.com/xuebinqin/DIS.git
cd DIS

Follow the installation instructions on https://github.com/xuebinqin/DIS?tab=readme-ov-file#1-clone-this-repo. Download or create some data (like this) and place it into the DIS project folder.

I am using the folder structure:

training/im (images)
training/gt (ground truth)
validation/im (images)
validation/gt (ground truth)

Apply this git patch for setting the right paths and remove normalization of images:

git apply dis-repo.patch

Start training:

cd IS-Net
python train_valid_inference_main.py

Export to ONNX (modify paths if needed):

python utils/pth_to_onnx.py

Research

Synthetic datasets have limitations for achieving great segmentation results. This is because artificial lighting, occlusion, scale or backgrounds create a gap between synthetic and real images. A "model trained solely on synthetic data generated with naïve domain randomization struggles to generalize on the real domain", see PEOPLESANSPEOPLE: A Synthetic Data Generator for Human-Centric Computer Vision (2022). However, hybrid training approaches seem to be promising and can even improve segmentation results.

Currently I am doing research how to close this gap with the resources I have. There are approaches like considering the pose of humans for improving segmentation results, see Cross-Domain Complementary Learning Using Pose for Multi-Person Part Segmentation (2019).

Support

This is the first iteration of the model, so there will be improvements!

If you identify cases were the model fails, upload your examples!

Known issues (work in progress):

close-ups: from above, from below, profile, from side
minor issues with hair segmentation when hair creates loops
more various backgrounds needed