File size: 2,903 Bytes
150d962
63b502d
150d962
 
d7b2280
8869360
 
 
 
150d962
63b502d
 
d7b2280
 
63b502d
 
 
 
 
 
 
150d962
 
 
 
8869360
 
656b5ab
 
 
1e165f4
3ec648f
150d962
 
 
 
656b5ab
 
 
 
 
 
 
 
 
 
 
 
afa6d79
656b5ab
 
 
150d962
 
2ff84f1
 
1351f60
2ff84f1
c35fa1f
150d962
63b502d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8751bf1
c35fa1f
 
4681c4c
c35fa1f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
title: Open Remove Background Model (ormbg)
license: apache-2.0
tags:
  - segmentation
  - remove background
  - background
  - background-removal
  - Pytorch
pretty_name: Open Remove Background Model
models:
  - schirrmacher/ormbg
datasets:
  - schirrmacher/humans
emoji: 💻
colorFrom: red
colorTo: red
sdk: gradio
sdk_version: 4.29.0
app_file: hf_space/app.py
pinned: false
---

# Open Remove Background Model (ormbg)

[>>> DEMO <<<](https://huggingface.co/spaces/schirrmacher/ormbg)

Join our [Research Discord Group](https://discord.gg/YYZ3D66t)!

![](examples/image/image01_no_background.png)

This model is a **fully open-source background remover** optimized for images with humans. It is based on [Highly Accurate Dichotomous Image Segmentation research](https://github.com/xuebinqin/DIS).

## Inference

```
python ormbg/inference.py
```

## Training

Install dependencies:

```
conda env create -f environment.yaml
conda activate ormbg
```

Replace dummy dataset with [training dataset](https://huggingface.co/datasets/schirrmacher/humans).

```
python3 ormbg/train_model.py
```

# Research

I started training the model with synthetic images of the [Human Segmentation Dataset](https://huggingface.co/datasets/schirrmacher/humans) crafted with [LayerDiffuse](https://github.com/layerdiffusion/LayerDiffuse). However, I noticed that the model struggles to perform well on real images.

Synthetic datasets have limitations for achieving great segmentation results. This is because artificial lighting, occlusion, scale or backgrounds create a gap between synthetic and real images. A "model trained solely on synthetic data generated with naïve domain randomization struggles to generalize on the real domain", see [PEOPLESANSPEOPLE: A Synthetic Data Generator for Human-Centric Computer Vision (2022)](https://arxiv.org/pdf/2112.09290).

### Next steps:

- Expand dataset with synthetic and real images
- Research on state of the art loss functions

### Latest changes (26/07/2024):

- Created synthetic dataset with 10k images, crafted with [BlenderProc](https://github.com/DLR-RM/BlenderProc)
- Removed training data created with [LayerDiffuse](https://github.com/layerdiffusion/LayerDiffuse), since it lacks the accuracy needed
- Improved model performance (after 100k iterations):
  - F1: 0.9888 -> 0.9932
  - MAE: 0.0113 -> 0.008
  - Scores based on [this validation dataset](https://drive.google.com/drive/folders/1Yy9clZ58xCiai1zYESQkEKZCkslSC8eg)

### 05/07/2024

- Added [P3M-10K](https://paperswithcode.com/dataset/p3m-10k) dataset for training and validation
- Added [AIM-500](https://paperswithcode.com/dataset/aim-500) dataset for training and validation
- Added [PPM-100](https://github.com/ZHKKKe/PPM) dataset for training and validation
- Applied [Grid Dropout](https://albumentations.ai/docs/api_reference/augmentations/dropout/grid_dropout/) to make the model smarter