ormbg / README.md
schirrmacher's picture
Upload ./README.md with huggingface_hub
2ff84f1 verified
|
raw
history blame
3.23 kB
metadata
license: apache-2.0
tags:
  - segmentation
  - remove background
  - background
  - background-removal
  - Pytorch
pretty_name: Open Remove Background Model
datasets:
  - schirrmacher/humans

Open Remove Background Model (ormbg)

>>> DEMO <<<

This model is a fully open-source background remover optimized for images with humans. It is based on Highly Accurate Dichotomous Image Segmentation research.

This model is similar to RMBG-1.4, but with open training data/process and commercially free to use.

Inference

python utils/inference.py

Training

The model was trained with the Human Segmentation Dataset.

After 10.000 iterations with a single NVIDIA GeForce RTX 4090 the following achievements were made:

  • Training Time: 8 hours
  • Training Loss: 0.1179
  • Validation Loss: 0.1284
  • Maximum F1 Score: 0.9928
  • Mean Absolute Error: 0.005

Output model: /models/ormbg.pth.

Want to train your own model?

Checkout Highly Accurate Dichotomous Image Segmentation code:

git clone https://github.com/xuebinqin/DIS.git
cd DIS

Follow the installation instructions on https://github.com/xuebinqin/DIS?tab=readme-ov-file#1-clone-this-repo. Download or create some data (like this) and place it into the DIS project folder.

I am using the folder structure:

  • training/im (images)
  • training/gt (ground truth)
  • validation/im (images)
  • validation/gt (ground truth)

Apply this git patch for setting the right paths and remove normalization of images:

git apply dis-repo.patch

Start training:

cd IS-Net
python train_valid_inference_main.py

Export to ONNX (modify paths if needed):

python utils/pth_to_onnx.py

Research

Synthetic datasets have limitations for achieving great segmentation results. This is because artificial lighting, occlusion, scale or backgrounds create a gap between synthetic and real images. A "model trained solely on synthetic data generated with naïve domain randomization struggles to generalize on the real domain", see PEOPLESANSPEOPLE: A Synthetic Data Generator for Human-Centric Computer Vision (2022). However, hybrid training approaches seem to be promising and can even improve segmentation results.

Currently I am doing research how to close this gap with the resources I have. There are approaches like considering the pose of humans for improving segmentation results, see Cross-Domain Complementary Learning Using Pose for Multi-Person Part Segmentation (2019).

Support

This is the first iteration of the model, so there will be improvements!

If you identify cases were the model fails, upload your examples!

Known issues (work in progress):

  • close-ups: from above, from below, profile, from side
  • minor issues with hair segmentation when hair creates loops
  • more various backgrounds needed