chflame163
commited on
Upload 21 files
Browse files- ComfyUI/models/segformer_b2_clothes/README.md +109 -0
- ComfyUI/models/segformer_b2_clothes/config.json +110 -0
- ComfyUI/models/segformer_b2_clothes/handler.py +39 -0
- ComfyUI/models/segformer_b2_clothes/model.safetensors +3 -0
- ComfyUI/models/segformer_b2_clothes/optimizer.pt +3 -0
- ComfyUI/models/segformer_b2_clothes/preprocessor_config.json +18 -0
- ComfyUI/models/segformer_b2_clothes/pytorch_model.bin +3 -0
- ComfyUI/models/segformer_b2_clothes/rng_state.pth +3 -0
- ComfyUI/models/segformer_b2_clothes/scheduler.pt +3 -0
- ComfyUI/models/segformer_b2_clothes/trainer_state.json +0 -0
- ComfyUI/models/segformer_b2_clothes/training_args.bin +3 -0
- ComfyUI/models/segformer_b3_clothes/README.md +112 -0
- ComfyUI/models/segformer_b3_clothes/config.json +110 -0
- ComfyUI/models/segformer_b3_clothes/model.safetensors +3 -0
- ComfyUI/models/segformer_b3_clothes/preprocessor_config.json +23 -0
- ComfyUI/models/segformer_b3_fashion/README.md +92 -0
- ComfyUI/models/segformer_b3_fashion/config.json +168 -0
- ComfyUI/models/segformer_b3_fashion/model.safetensors +3 -0
- ComfyUI/models/segformer_b3_fashion/preprocessor_config.json +23 -0
- ComfyUI/models/segformer_b3_fashion/pytorch_model_2.bin +3 -0
- ComfyUI/models/segformer_b3_fashion/training_args.bin +3 -0
ComfyUI/models/segformer_b2_clothes/README.md
ADDED
@@ -0,0 +1,109 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
tags:
|
4 |
+
- vision
|
5 |
+
- image-segmentation
|
6 |
+
widget:
|
7 |
+
- src: https://images.unsplash.com/photo-1643310325061-2beef64926a5?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxzZWFyY2h8Nnx8cmFjb29uc3xlbnwwfHwwfHw%3D&w=1000&q=80
|
8 |
+
example_title: Person
|
9 |
+
- src: https://freerangestock.com/sample/139043/young-man-standing-and-leaning-on-car.jpg
|
10 |
+
example_title: Person
|
11 |
+
datasets:
|
12 |
+
- mattmdjaga/human_parsing_dataset
|
13 |
+
---
|
14 |
+
# Segformer B2 fine-tuned for clothes segmentation
|
15 |
+
|
16 |
+
SegFormer model fine-tuned on [ATR dataset](https://github.com/lemondan/HumanParsing-Dataset) for clothes segmentation but can also be used for human segmentation.
|
17 |
+
The dataset on hugging face is called "mattmdjaga/human_parsing_dataset".
|
18 |
+
|
19 |
+
|
20 |
+
**NEW** -
|
21 |
+
**[Training code](https://github.com/mattmdjaga/segformer_b2_clothes)**. Right now it only contains the pure code with some comments, but soon I'll add a colab notebook version
|
22 |
+
and a blog post with it to make it more friendly.
|
23 |
+
|
24 |
+
```python
|
25 |
+
from transformers import SegformerImageProcessor, AutoModelForSemanticSegmentation
|
26 |
+
from PIL import Image
|
27 |
+
import requests
|
28 |
+
import matplotlib.pyplot as plt
|
29 |
+
import torch.nn as nn
|
30 |
+
|
31 |
+
processor = SegformerImageProcessor.from_pretrained("mattmdjaga/segformer_b2_clothes")
|
32 |
+
model = AutoModelForSemanticSegmentation.from_pretrained("mattmdjaga/segformer_b2_clothes")
|
33 |
+
|
34 |
+
url = "https://plus.unsplash.com/premium_photo-1673210886161-bfcc40f54d1f?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxzZWFyY2h8MXx8cGVyc29uJTIwc3RhbmRpbmd8ZW58MHx8MHx8&w=1000&q=80"
|
35 |
+
|
36 |
+
image = Image.open(requests.get(url, stream=True).raw)
|
37 |
+
inputs = processor(images=image, return_tensors="pt")
|
38 |
+
|
39 |
+
outputs = model(**inputs)
|
40 |
+
logits = outputs.logits.cpu()
|
41 |
+
|
42 |
+
upsampled_logits = nn.functional.interpolate(
|
43 |
+
logits,
|
44 |
+
size=image.size[::-1],
|
45 |
+
mode="bilinear",
|
46 |
+
align_corners=False,
|
47 |
+
)
|
48 |
+
|
49 |
+
pred_seg = upsampled_logits.argmax(dim=1)[0]
|
50 |
+
plt.imshow(pred_seg)
|
51 |
+
```
|
52 |
+
|
53 |
+
Labels: 0: "Background", 1: "Hat", 2: "Hair", 3: "Sunglasses", 4: "Upper-clothes", 5: "Skirt", 6: "Pants", 7: "Dress", 8: "Belt", 9: "Left-shoe", 10: "Right-shoe", 11: "Face", 12: "Left-leg", 13: "Right-leg", 14: "Left-arm", 15: "Right-arm", 16: "Bag", 17: "Scarf"
|
54 |
+
|
55 |
+
### Evaluation
|
56 |
+
|
57 |
+
| Label Index | Label Name | Category Accuracy | Category IoU |
|
58 |
+
|:-------------:|:----------------:|:-----------------:|:------------:|
|
59 |
+
| 0 | Background | 0.99 | 0.99 |
|
60 |
+
| 1 | Hat | 0.73 | 0.68 |
|
61 |
+
| 2 | Hair | 0.91 | 0.82 |
|
62 |
+
| 3 | Sunglasses | 0.73 | 0.63 |
|
63 |
+
| 4 | Upper-clothes | 0.87 | 0.78 |
|
64 |
+
| 5 | Skirt | 0.76 | 0.65 |
|
65 |
+
| 6 | Pants | 0.90 | 0.84 |
|
66 |
+
| 7 | Dress | 0.74 | 0.55 |
|
67 |
+
| 8 | Belt | 0.35 | 0.30 |
|
68 |
+
| 9 | Left-shoe | 0.74 | 0.58 |
|
69 |
+
| 10 | Right-shoe | 0.75 | 0.60 |
|
70 |
+
| 11 | Face | 0.92 | 0.85 |
|
71 |
+
| 12 | Left-leg | 0.90 | 0.82 |
|
72 |
+
| 13 | Right-leg | 0.90 | 0.81 |
|
73 |
+
| 14 | Left-arm | 0.86 | 0.74 |
|
74 |
+
| 15 | Right-arm | 0.82 | 0.73 |
|
75 |
+
| 16 | Bag | 0.91 | 0.84 |
|
76 |
+
| 17 | Scarf | 0.63 | 0.29 |
|
77 |
+
|
78 |
+
Overall Evaluation Metrics:
|
79 |
+
- Evaluation Loss: 0.15
|
80 |
+
- Mean Accuracy: 0.80
|
81 |
+
- Mean IoU: 0.69
|
82 |
+
|
83 |
+
### License
|
84 |
+
|
85 |
+
The license for this model can be found [here](https://github.com/NVlabs/SegFormer/blob/master/LICENSE).
|
86 |
+
|
87 |
+
### BibTeX entry and citation info
|
88 |
+
|
89 |
+
```bibtex
|
90 |
+
@article{DBLP:journals/corr/abs-2105-15203,
|
91 |
+
author = {Enze Xie and
|
92 |
+
Wenhai Wang and
|
93 |
+
Zhiding Yu and
|
94 |
+
Anima Anandkumar and
|
95 |
+
Jose M. Alvarez and
|
96 |
+
Ping Luo},
|
97 |
+
title = {SegFormer: Simple and Efficient Design for Semantic Segmentation with
|
98 |
+
Transformers},
|
99 |
+
journal = {CoRR},
|
100 |
+
volume = {abs/2105.15203},
|
101 |
+
year = {2021},
|
102 |
+
url = {https://arxiv.org/abs/2105.15203},
|
103 |
+
eprinttype = {arXiv},
|
104 |
+
eprint = {2105.15203},
|
105 |
+
timestamp = {Wed, 02 Jun 2021 11:46:42 +0200},
|
106 |
+
biburl = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib},
|
107 |
+
bibsource = {dblp computer science bibliography, https://dblp.org}
|
108 |
+
}
|
109 |
+
```
|
ComfyUI/models/segformer_b2_clothes/config.json
ADDED
@@ -0,0 +1,110 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "nvidia/mit-b2",
|
3 |
+
"architectures": [
|
4 |
+
"SegformerForSemanticSegmentation"
|
5 |
+
],
|
6 |
+
"attention_probs_dropout_prob": 0.0,
|
7 |
+
"classifier_dropout_prob": 0.1,
|
8 |
+
"decoder_hidden_size": 768,
|
9 |
+
"depths": [
|
10 |
+
3,
|
11 |
+
4,
|
12 |
+
6,
|
13 |
+
3
|
14 |
+
],
|
15 |
+
"downsampling_rates": [
|
16 |
+
1,
|
17 |
+
4,
|
18 |
+
8,
|
19 |
+
16
|
20 |
+
],
|
21 |
+
"drop_path_rate": 0.1,
|
22 |
+
"hidden_act": "gelu",
|
23 |
+
"hidden_dropout_prob": 0.0,
|
24 |
+
"hidden_sizes": [
|
25 |
+
64,
|
26 |
+
128,
|
27 |
+
320,
|
28 |
+
512
|
29 |
+
],
|
30 |
+
"id2label": {
|
31 |
+
"0": "Background",
|
32 |
+
"1": "Hat",
|
33 |
+
"2": "Hair",
|
34 |
+
"3": "Sunglasses",
|
35 |
+
"4": "Upper-clothes",
|
36 |
+
"5": "Skirt",
|
37 |
+
"6": "Pants",
|
38 |
+
"7": "Dress",
|
39 |
+
"8": "Belt",
|
40 |
+
"9": "Left-shoe",
|
41 |
+
"10": "Right-shoe",
|
42 |
+
"11": "Face",
|
43 |
+
"12": "Left-leg",
|
44 |
+
"13": "Right-leg",
|
45 |
+
"14": "Left-arm",
|
46 |
+
"15": "Right-arm",
|
47 |
+
"16": "Bag",
|
48 |
+
"17": "Scarf"
|
49 |
+
},
|
50 |
+
"image_size": 224,
|
51 |
+
"initializer_range": 0.02,
|
52 |
+
"label2id": {
|
53 |
+
"Background": 0,
|
54 |
+
"Bag": 16,
|
55 |
+
"Belt": 8,
|
56 |
+
"Dress": 7,
|
57 |
+
"Face": 11,
|
58 |
+
"Hair": 2,
|
59 |
+
"Hat": 1,
|
60 |
+
"Left-arm": 14,
|
61 |
+
"Left-leg": 12,
|
62 |
+
"Left-shoe": 9,
|
63 |
+
"Pants": 6,
|
64 |
+
"Right-arm": 15,
|
65 |
+
"Right-leg": 13,
|
66 |
+
"Right-shoe": 10,
|
67 |
+
"Scarf": 17,
|
68 |
+
"Skirt": 5,
|
69 |
+
"Sunglasses": 3,
|
70 |
+
"Upper-clothes": 4
|
71 |
+
},
|
72 |
+
"layer_norm_eps": 1e-06,
|
73 |
+
"mlp_ratios": [
|
74 |
+
4,
|
75 |
+
4,
|
76 |
+
4,
|
77 |
+
4
|
78 |
+
],
|
79 |
+
"model_type": "segformer",
|
80 |
+
"num_attention_heads": [
|
81 |
+
1,
|
82 |
+
2,
|
83 |
+
5,
|
84 |
+
8
|
85 |
+
],
|
86 |
+
"num_channels": 3,
|
87 |
+
"num_encoder_blocks": 4,
|
88 |
+
"patch_sizes": [
|
89 |
+
7,
|
90 |
+
3,
|
91 |
+
3,
|
92 |
+
3
|
93 |
+
],
|
94 |
+
"reshape_last_stage": true,
|
95 |
+
"semantic_loss_ignore_index": 255,
|
96 |
+
"sr_ratios": [
|
97 |
+
8,
|
98 |
+
4,
|
99 |
+
2,
|
100 |
+
1
|
101 |
+
],
|
102 |
+
"strides": [
|
103 |
+
4,
|
104 |
+
2,
|
105 |
+
2,
|
106 |
+
2
|
107 |
+
],
|
108 |
+
"torch_dtype": "float32",
|
109 |
+
"transformers_version": "4.24.0"
|
110 |
+
}
|
ComfyUI/models/segformer_b2_clothes/handler.py
ADDED
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from typing import Dict, List, Any
|
2 |
+
from PIL import Image
|
3 |
+
from io import BytesIO
|
4 |
+
from transformers import AutoModelForSemanticSegmentation, AutoFeatureExtractor
|
5 |
+
import base64
|
6 |
+
import torch
|
7 |
+
from torch import nn
|
8 |
+
|
9 |
+
class EndpointHandler():
|
10 |
+
def __init__(self, path="."):
|
11 |
+
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
12 |
+
self.model = AutoModelForSemanticSegmentation.from_pretrained(path).to(self.device).eval()
|
13 |
+
self.feature_extractor = AutoFeatureExtractor.from_pretrained(path)
|
14 |
+
|
15 |
+
def __call__(self, data: Dict[str, Any]) -> List[Dict[str, Any]]:
|
16 |
+
"""
|
17 |
+
data args:
|
18 |
+
images (:obj:`PIL.Image`)
|
19 |
+
candiates (:obj:`list`)
|
20 |
+
Return:
|
21 |
+
A :obj:`list`:. The list contains items that are dicts should be liked {"label": "XXX", "score": 0.82}
|
22 |
+
"""
|
23 |
+
inputs = data.pop("inputs", data)
|
24 |
+
|
25 |
+
# decode base64 image to PIL
|
26 |
+
image = Image.open(BytesIO(base64.b64decode(inputs['image'])))
|
27 |
+
|
28 |
+
# preprocess image
|
29 |
+
encoding = self.feature_extractor(images=image, return_tensors="pt")
|
30 |
+
pixel_values = encoding["pixel_values"].to(self.device)
|
31 |
+
with torch.no_grad():
|
32 |
+
outputs = self.model(pixel_values=pixel_values)
|
33 |
+
logits = outputs.logits
|
34 |
+
upsampled_logits = nn.functional.interpolate(logits,
|
35 |
+
size=image.size[::-1],
|
36 |
+
mode="bilinear",
|
37 |
+
align_corners=False,)
|
38 |
+
pred_seg = upsampled_logits.argmax(dim=1)[0]
|
39 |
+
return pred_seg.tolist()
|
ComfyUI/models/segformer_b2_clothes/model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8f86fd90c567afd4370b3cc3a7e81ed767a632b2832a738331af660acc0c4c68
|
3 |
+
size 109493236
|
ComfyUI/models/segformer_b2_clothes/optimizer.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4f642f5c29cb7c9ac0ff242ccf94220c88913f4a65db4727b2530a987ce14d9a
|
3 |
+
size 219104837
|
ComfyUI/models/segformer_b2_clothes/preprocessor_config.json
ADDED
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"do_normalize": true,
|
3 |
+
"do_resize": true,
|
4 |
+
"feature_extractor_type": "SegformerFeatureExtractor",
|
5 |
+
"image_mean": [
|
6 |
+
0.485,
|
7 |
+
0.456,
|
8 |
+
0.406
|
9 |
+
],
|
10 |
+
"image_std": [
|
11 |
+
0.229,
|
12 |
+
0.224,
|
13 |
+
0.225
|
14 |
+
],
|
15 |
+
"reduce_labels": false,
|
16 |
+
"resample": 2,
|
17 |
+
"size": 512
|
18 |
+
}
|
ComfyUI/models/segformer_b2_clothes/pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:934543143c97acf3197b030bb0ba046f6c713757467a7dcf47f27ce8c0d6264d
|
3 |
+
size 109579005
|
ComfyUI/models/segformer_b2_clothes/rng_state.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a7c38376dfee2c075efd2b37186139541f47970794c545ba17f510796313aaa8
|
3 |
+
size 14575
|
ComfyUI/models/segformer_b2_clothes/scheduler.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7a9a297dec0fe2336eab64ac3bbd47e4936655c43239740a40cfe5f4623a0657
|
3 |
+
size 627
|
ComfyUI/models/segformer_b2_clothes/trainer_state.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
ComfyUI/models/segformer_b2_clothes/training_args.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:210f58c34439201a03f7a2e923b10e2a9b03a8943740f452ae4e8f57ebcfc186
|
3 |
+
size 3323
|
ComfyUI/models/segformer_b3_clothes/README.md
ADDED
@@ -0,0 +1,112 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
tags:
|
4 |
+
- vision
|
5 |
+
- image-segmentation
|
6 |
+
widget:
|
7 |
+
- src: >-
|
8 |
+
https://images.unsplash.com/photo-1643310325061-2beef64926a5?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxzZWFyY2h8Nnx8cmFjb29uc3xlbnwwfHwwfHw%3D&w=1000&q=80
|
9 |
+
example_title: Person
|
10 |
+
- src: >-
|
11 |
+
https://freerangestock.com/sample/139043/young-man-standing-and-leaning-on-car.jpg
|
12 |
+
example_title: Person
|
13 |
+
datasets:
|
14 |
+
- mattmdjaga/human_parsing_dataset
|
15 |
+
pipeline_tag: image-segmentation
|
16 |
+
---
|
17 |
+
# Segformer B3 fine-tuned for clothes segmentation
|
18 |
+
|
19 |
+
SegFormer model fine-tuned on [ATR dataset](https://github.com/lemondan/HumanParsing-Dataset) for clothes segmentation but can also be used for human segmentation.
|
20 |
+
The dataset on hugging face is called "mattmdjaga/human_parsing_dataset".
|
21 |
+
|
22 |
+
|
23 |
+
**NEW** -
|
24 |
+
**[Training code](https://github.com/mattmdjaga/segformer_b2_clothes)**. Right now it only contains the pure code with some comments, but soon I'll add a colab notebook version
|
25 |
+
and a blog post with it to make it more friendly.
|
26 |
+
|
27 |
+
```python
|
28 |
+
from transformers import SegformerImageProcessor, AutoModelForSemanticSegmentation
|
29 |
+
from PIL import Image
|
30 |
+
import requests
|
31 |
+
import matplotlib.pyplot as plt
|
32 |
+
import torch.nn as nn
|
33 |
+
|
34 |
+
processor = SegformerImageProcessor.from_pretrained("sayeed99/segformer_b3_clothes")
|
35 |
+
model = AutoModelForSemanticSegmentation.from_pretrained("sayeed99/segformer_b3_clothes")
|
36 |
+
|
37 |
+
url = "https://plus.unsplash.com/premium_photo-1673210886161-bfcc40f54d1f?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxzZWFyY2h8MXx8cGVyc29uJTIwc3RhbmRpbmd8ZW58MHx8MHx8&w=1000&q=80"
|
38 |
+
|
39 |
+
image = Image.open(requests.get(url, stream=True).raw)
|
40 |
+
inputs = processor(images=image, return_tensors="pt")
|
41 |
+
|
42 |
+
outputs = model(**inputs)
|
43 |
+
logits = outputs.logits.cpu()
|
44 |
+
|
45 |
+
upsampled_logits = nn.functional.interpolate(
|
46 |
+
logits,
|
47 |
+
size=image.size[::-1],
|
48 |
+
mode="bilinear",
|
49 |
+
align_corners=False,
|
50 |
+
)
|
51 |
+
|
52 |
+
pred_seg = upsampled_logits.argmax(dim=1)[0]
|
53 |
+
plt.imshow(pred_seg)
|
54 |
+
```
|
55 |
+
|
56 |
+
Labels: 0: "Background", 1: "Hat", 2: "Hair", 3: "Sunglasses", 4: "Upper-clothes", 5: "Skirt", 6: "Pants", 7: "Dress", 8: "Belt", 9: "Left-shoe", 10: "Right-shoe", 11: "Face", 12: "Left-leg", 13: "Right-leg", 14: "Left-arm", 15: "Right-arm", 16: "Bag", 17: "Scarf"
|
57 |
+
|
58 |
+
### Evaluation
|
59 |
+
|
60 |
+
| Label Index | Label Name | Category Accuracy | Category IoU |
|
61 |
+
|:-------------:|:----------------:|:-----------------:|:------------:|
|
62 |
+
| 0 | Background | 0.99 | 0.99 |
|
63 |
+
| 1 | Hat | 0.73 | 0.68 |
|
64 |
+
| 2 | Hair | 0.91 | 0.82 |
|
65 |
+
| 3 | Sunglasses | 0.73 | 0.63 |
|
66 |
+
| 4 | Upper-clothes | 0.87 | 0.78 |
|
67 |
+
| 5 | Skirt | 0.76 | 0.65 |
|
68 |
+
| 6 | Pants | 0.90 | 0.84 |
|
69 |
+
| 7 | Dress | 0.74 | 0.55 |
|
70 |
+
| 8 | Belt | 0.35 | 0.30 |
|
71 |
+
| 9 | Left-shoe | 0.74 | 0.58 |
|
72 |
+
| 10 | Right-shoe | 0.75 | 0.60 |
|
73 |
+
| 11 | Face | 0.92 | 0.85 |
|
74 |
+
| 12 | Left-leg | 0.90 | 0.82 |
|
75 |
+
| 13 | Right-leg | 0.90 | 0.81 |
|
76 |
+
| 14 | Left-arm | 0.86 | 0.74 |
|
77 |
+
| 15 | Right-arm | 0.82 | 0.73 |
|
78 |
+
| 16 | Bag | 0.91 | 0.84 |
|
79 |
+
| 17 | Scarf | 0.63 | 0.29 |
|
80 |
+
|
81 |
+
Overall Evaluation Metrics:
|
82 |
+
- Evaluation Loss: 0.15
|
83 |
+
- Mean Accuracy: 0.80
|
84 |
+
- Mean IoU: 0.69
|
85 |
+
|
86 |
+
### License
|
87 |
+
|
88 |
+
The license for this model can be found [here](https://github.com/NVlabs/SegFormer/blob/master/LICENSE).
|
89 |
+
|
90 |
+
### BibTeX entry and citation info
|
91 |
+
|
92 |
+
```bibtex
|
93 |
+
@article{DBLP:journals/corr/abs-2105-15203,
|
94 |
+
author = {Enze Xie and
|
95 |
+
Wenhai Wang and
|
96 |
+
Zhiding Yu and
|
97 |
+
Anima Anandkumar and
|
98 |
+
Jose M. Alvarez and
|
99 |
+
Ping Luo},
|
100 |
+
title = {SegFormer: Simple and Efficient Design for Semantic Segmentation with
|
101 |
+
Transformers},
|
102 |
+
journal = {CoRR},
|
103 |
+
volume = {abs/2105.15203},
|
104 |
+
year = {2021},
|
105 |
+
url = {https://arxiv.org/abs/2105.15203},
|
106 |
+
eprinttype = {arXiv},
|
107 |
+
eprint = {2105.15203},
|
108 |
+
timestamp = {Wed, 02 Jun 2021 11:46:42 +0200},
|
109 |
+
biburl = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib},
|
110 |
+
bibsource = {dblp computer science bibliography, https://dblp.org}
|
111 |
+
}
|
112 |
+
```
|
ComfyUI/models/segformer_b3_clothes/config.json
ADDED
@@ -0,0 +1,110 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "nvidia/mit-b3",
|
3 |
+
"architectures": [
|
4 |
+
"SegformerForSemanticSegmentation"
|
5 |
+
],
|
6 |
+
"attention_probs_dropout_prob": 0.0,
|
7 |
+
"classifier_dropout_prob": 0.1,
|
8 |
+
"decoder_hidden_size": 768,
|
9 |
+
"depths": [
|
10 |
+
3,
|
11 |
+
4,
|
12 |
+
18,
|
13 |
+
3
|
14 |
+
],
|
15 |
+
"downsampling_rates": [
|
16 |
+
1,
|
17 |
+
4,
|
18 |
+
8,
|
19 |
+
16
|
20 |
+
],
|
21 |
+
"drop_path_rate": 0.1,
|
22 |
+
"hidden_act": "gelu",
|
23 |
+
"hidden_dropout_prob": 0.0,
|
24 |
+
"hidden_sizes": [
|
25 |
+
64,
|
26 |
+
128,
|
27 |
+
320,
|
28 |
+
512
|
29 |
+
],
|
30 |
+
"id2label": {
|
31 |
+
"0": "Background",
|
32 |
+
"1": "Hat",
|
33 |
+
"10": "Right-shoe",
|
34 |
+
"11": "Face",
|
35 |
+
"12": "Left-leg",
|
36 |
+
"13": "Right-leg",
|
37 |
+
"14": "Left-arm",
|
38 |
+
"15": "Right-arm",
|
39 |
+
"16": "Bag",
|
40 |
+
"17": "Scarf",
|
41 |
+
"2": "Hair",
|
42 |
+
"3": "Sunglasses",
|
43 |
+
"4": "Upper-clothes",
|
44 |
+
"5": "Skirt",
|
45 |
+
"6": "Pants",
|
46 |
+
"7": "Dress",
|
47 |
+
"8": "Belt",
|
48 |
+
"9": "Left-shoe"
|
49 |
+
},
|
50 |
+
"image_size": 224,
|
51 |
+
"initializer_range": 0.02,
|
52 |
+
"label2id": {
|
53 |
+
"Background": "0",
|
54 |
+
"Bag": "16",
|
55 |
+
"Belt": "8",
|
56 |
+
"Dress": "7",
|
57 |
+
"Face": "11",
|
58 |
+
"Hair": "2",
|
59 |
+
"Hat": "1",
|
60 |
+
"Left-arm": "14",
|
61 |
+
"Left-leg": "12",
|
62 |
+
"Left-shoe": "9",
|
63 |
+
"Pants": "6",
|
64 |
+
"Right-arm": "15",
|
65 |
+
"Right-leg": "13",
|
66 |
+
"Right-shoe": "10",
|
67 |
+
"Scarf": "17",
|
68 |
+
"Skirt": "5",
|
69 |
+
"Sunglasses": "3",
|
70 |
+
"Upper-clothes": "4"
|
71 |
+
},
|
72 |
+
"layer_norm_eps": 1e-06,
|
73 |
+
"mlp_ratios": [
|
74 |
+
4,
|
75 |
+
4,
|
76 |
+
4,
|
77 |
+
4
|
78 |
+
],
|
79 |
+
"model_type": "segformer",
|
80 |
+
"num_attention_heads": [
|
81 |
+
1,
|
82 |
+
2,
|
83 |
+
5,
|
84 |
+
8
|
85 |
+
],
|
86 |
+
"num_channels": 3,
|
87 |
+
"num_encoder_blocks": 4,
|
88 |
+
"patch_sizes": [
|
89 |
+
7,
|
90 |
+
3,
|
91 |
+
3,
|
92 |
+
3
|
93 |
+
],
|
94 |
+
"reshape_last_stage": true,
|
95 |
+
"semantic_loss_ignore_index": 255,
|
96 |
+
"sr_ratios": [
|
97 |
+
8,
|
98 |
+
4,
|
99 |
+
2,
|
100 |
+
1
|
101 |
+
],
|
102 |
+
"strides": [
|
103 |
+
4,
|
104 |
+
2,
|
105 |
+
2,
|
106 |
+
2
|
107 |
+
],
|
108 |
+
"torch_dtype": "float32",
|
109 |
+
"transformers_version": "4.38.1"
|
110 |
+
}
|
ComfyUI/models/segformer_b3_clothes/model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f70ae566c5773fb335796ebaa8acc924ac25eb97222c2b2967d44d2fc11568e6
|
3 |
+
size 189029000
|
ComfyUI/models/segformer_b3_clothes/preprocessor_config.json
ADDED
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"do_normalize": true,
|
3 |
+
"do_reduce_labels": false,
|
4 |
+
"do_rescale": true,
|
5 |
+
"do_resize": true,
|
6 |
+
"image_mean": [
|
7 |
+
0.485,
|
8 |
+
0.456,
|
9 |
+
0.406
|
10 |
+
],
|
11 |
+
"image_processor_type": "SegformerImageProcessor",
|
12 |
+
"image_std": [
|
13 |
+
0.229,
|
14 |
+
0.224,
|
15 |
+
0.225
|
16 |
+
],
|
17 |
+
"resample": 2,
|
18 |
+
"rescale_factor": 0.00392156862745098,
|
19 |
+
"size": {
|
20 |
+
"height": 512,
|
21 |
+
"width": 512
|
22 |
+
}
|
23 |
+
}
|
ComfyUI/models/segformer_b3_fashion/README.md
ADDED
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: other
|
3 |
+
tags:
|
4 |
+
- vision
|
5 |
+
- image-segmentation
|
6 |
+
- generated_from_trainer
|
7 |
+
widget:
|
8 |
+
- src: >-
|
9 |
+
https://media.istockphoto.com/id/515788534/photo/cheerful-and-confidant.jpg?s=612x612&w=0&k=20&c=T0Z4DfameRpyGhzevPomrm-wjZp7wmGjpAyjGcTzpkA=
|
10 |
+
example_title: Person
|
11 |
+
- src: >-
|
12 |
+
https://storage.googleapis.com/pai-images/1484fd9ea9d746eb9f1de0d6778dbea2.jpeg
|
13 |
+
example_title: Person
|
14 |
+
datasets:
|
15 |
+
- sayeed99/fashion_segmentation
|
16 |
+
model-index:
|
17 |
+
- name: segformer-b3-fashion
|
18 |
+
results: []
|
19 |
+
pipeline_tag: image-segmentation
|
20 |
+
---
|
21 |
+
|
22 |
+
|
23 |
+
# segformer-b3-fashion
|
24 |
+
|
25 |
+
This model is a fine-tuned version of [nvidia/mit-b3](https://huggingface.co/nvidia/mit-b3) on the sayeed99/fashion_segmentation dataset using original image sizes without resizing.
|
26 |
+
|
27 |
+
|
28 |
+
```python
|
29 |
+
from transformers import SegformerImageProcessor, AutoModelForSemanticSegmentation
|
30 |
+
from PIL import Image
|
31 |
+
import requests
|
32 |
+
import matplotlib.pyplot as plt
|
33 |
+
import torch.nn as nn
|
34 |
+
|
35 |
+
processor = SegformerImageProcessor.from_pretrained("sayeed99/segformer-b3-fashion")
|
36 |
+
model = AutoModelForSemanticSegmentation.from_pretrained("sayeed99/segformer-b3-fashion")
|
37 |
+
|
38 |
+
url = "https://plus.unsplash.com/premium_photo-1673210886161-bfcc40f54d1f?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxzZWFyY2h8MXx8cGVyc29uJTIwc3RhbmRpbmd8ZW58MHx8MHx8&w=1000&q=80"
|
39 |
+
|
40 |
+
image = Image.open(requests.get(url, stream=True).raw)
|
41 |
+
inputs = processor(images=image, return_tensors="pt")
|
42 |
+
|
43 |
+
outputs = model(**inputs)
|
44 |
+
logits = outputs.logits.cpu()
|
45 |
+
|
46 |
+
upsampled_logits = nn.functional.interpolate(
|
47 |
+
logits,
|
48 |
+
size=image.size[::-1],
|
49 |
+
mode="bilinear",
|
50 |
+
align_corners=False,
|
51 |
+
)
|
52 |
+
|
53 |
+
pred_seg = upsampled_logits.argmax(dim=1)[0]
|
54 |
+
plt.imshow(pred_seg)
|
55 |
+
```
|
56 |
+
|
57 |
+
Labels : {"0":"Unlabelled", "1": "shirt, blouse", "2": "top, t-shirt, sweatshirt", "3": "sweater", "4": "cardigan", "5": "jacket", "6": "vest", "7": "pants", "8": "shorts", "9": "skirt", "10": "coat", "11": "dress", "12": "jumpsuit", "13": "cape", "14": "glasses", "15": "hat", "16": "headband, head covering, hair accessory", "17": "tie", "18": "glove", "19": "watch", "20": "belt", "21": "leg warmer", "22": "tights, stockings", "23": "sock", "24": "shoe", "25": "bag, wallet", "26": "scarf", "27": "umbrella", "28": "hood", "29": "collar", "30": "lapel", "31": "epaulette", "32": "sleeve", "33": "pocket", "34": "neckline", "35": "buckle", "36": "zipper", "37": "applique", "38": "bead", "39": "bow", "40": "flower", "41": "fringe", "42": "ribbon", "43": "rivet", "44": "ruffle", "45": "sequin", "46": "tassel"}
|
58 |
+
|
59 |
+
### Framework versions
|
60 |
+
|
61 |
+
- Transformers 4.30.0
|
62 |
+
- Pytorch 2.2.2+cu121
|
63 |
+
- Datasets 2.18.0
|
64 |
+
- Tokenizers 0.13.3
|
65 |
+
|
66 |
+
|
67 |
+
### License
|
68 |
+
|
69 |
+
The license for this model can be found [here](https://github.com/NVlabs/SegFormer/blob/master/LICENSE).
|
70 |
+
|
71 |
+
### BibTeX entry and citation info
|
72 |
+
|
73 |
+
```bibtex
|
74 |
+
@article{DBLP:journals/corr/abs-2105-15203,
|
75 |
+
author = {Enze Xie and
|
76 |
+
Wenhai Wang and
|
77 |
+
Zhiding Yu and
|
78 |
+
Anima Anandkumar and
|
79 |
+
Jose M. Alvarez and
|
80 |
+
Ping Luo},
|
81 |
+
title = {SegFormer: Simple and Efficient Design for Semantic Segmentation with
|
82 |
+
Transformers},
|
83 |
+
journal = {CoRR},
|
84 |
+
volume = {abs/2105.15203},
|
85 |
+
year = {2021},
|
86 |
+
url = {https://arxiv.org/abs/2105.15203},
|
87 |
+
eprinttype = {arXiv},
|
88 |
+
eprint = {2105.15203},
|
89 |
+
timestamp = {Wed, 02 Jun 2021 11:46:42 +0200},
|
90 |
+
biburl = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib},
|
91 |
+
bibsource = {dblp computer science bibliography, https://dblp.org}
|
92 |
+
}
|
ComfyUI/models/segformer_b3_fashion/config.json
ADDED
@@ -0,0 +1,168 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "nvidia/mit-b3",
|
3 |
+
"architectures": [
|
4 |
+
"SegformerForSemanticSegmentation"
|
5 |
+
],
|
6 |
+
"attention_probs_dropout_prob": 0.0,
|
7 |
+
"classifier_dropout_prob": 0.1,
|
8 |
+
"decoder_hidden_size": 768,
|
9 |
+
"depths": [
|
10 |
+
3,
|
11 |
+
4,
|
12 |
+
18,
|
13 |
+
3
|
14 |
+
],
|
15 |
+
"downsampling_rates": [
|
16 |
+
1,
|
17 |
+
4,
|
18 |
+
8,
|
19 |
+
16
|
20 |
+
],
|
21 |
+
"drop_path_rate": 0.1,
|
22 |
+
"hidden_act": "gelu",
|
23 |
+
"hidden_dropout_prob": 0.0,
|
24 |
+
"hidden_sizes": [
|
25 |
+
64,
|
26 |
+
128,
|
27 |
+
320,
|
28 |
+
512
|
29 |
+
],
|
30 |
+
"id2label": {
|
31 |
+
"0": "unlabelled",
|
32 |
+
"1": "shirt, blouse",
|
33 |
+
"2": "top, t-shirt, sweatshirt",
|
34 |
+
"3": "sweater",
|
35 |
+
"4": "cardigan",
|
36 |
+
"5": "jacket",
|
37 |
+
"6": "vest",
|
38 |
+
"7": "pants",
|
39 |
+
"8": "shorts",
|
40 |
+
"9": "skirt",
|
41 |
+
"10": "coat",
|
42 |
+
"11": "dress",
|
43 |
+
"12": "jumpsuit",
|
44 |
+
"13": "cape",
|
45 |
+
"14": "glasses",
|
46 |
+
"15": "hat",
|
47 |
+
"16": "headband, head covering, hair accessory",
|
48 |
+
"17": "tie",
|
49 |
+
"18": "glove",
|
50 |
+
"19": "watch",
|
51 |
+
"20": "belt",
|
52 |
+
"21": "leg warmer",
|
53 |
+
"22": "tights, stockings",
|
54 |
+
"23": "sock",
|
55 |
+
"24": "shoe",
|
56 |
+
"25": "bag, wallet",
|
57 |
+
"26": "scarf",
|
58 |
+
"27": "umbrella",
|
59 |
+
"28": "hood",
|
60 |
+
"29": "collar",
|
61 |
+
"30": "lapel",
|
62 |
+
"31": "epaulette",
|
63 |
+
"32": "sleeve",
|
64 |
+
"33": "pocket",
|
65 |
+
"34": "neckline",
|
66 |
+
"35": "buckle",
|
67 |
+
"36": "zipper",
|
68 |
+
"37": "applique",
|
69 |
+
"38": "bead",
|
70 |
+
"39": "bow",
|
71 |
+
"40": "flower",
|
72 |
+
"41": "fringe",
|
73 |
+
"42": "ribbon",
|
74 |
+
"43": "rivet",
|
75 |
+
"44": "ruffle",
|
76 |
+
"45": "sequin",
|
77 |
+
"46": "tassel"
|
78 |
+
},
|
79 |
+
"image_size": 224,
|
80 |
+
"initializer_range": 0.02,
|
81 |
+
"label2id": {
|
82 |
+
"applique": 37,
|
83 |
+
"bag, wallet": 25,
|
84 |
+
"bead": 38,
|
85 |
+
"belt": 20,
|
86 |
+
"bow": 39,
|
87 |
+
"buckle": 35,
|
88 |
+
"cape": 13,
|
89 |
+
"cardigan": 4,
|
90 |
+
"coat": 10,
|
91 |
+
"collar": 29,
|
92 |
+
"dress": 11,
|
93 |
+
"epaulette": 31,
|
94 |
+
"flower": 40,
|
95 |
+
"fringe": 41,
|
96 |
+
"glasses": 14,
|
97 |
+
"glove": 18,
|
98 |
+
"hat": 15,
|
99 |
+
"headband, head covering, hair accessory": 16,
|
100 |
+
"hood": 28,
|
101 |
+
"jacket": 5,
|
102 |
+
"jumpsuit": 12,
|
103 |
+
"lapel": 30,
|
104 |
+
"leg warmer": 21,
|
105 |
+
"neckline": 34,
|
106 |
+
"pants": 7,
|
107 |
+
"pocket": 33,
|
108 |
+
"ribbon": 42,
|
109 |
+
"rivet": 43,
|
110 |
+
"ruffle": 44,
|
111 |
+
"scarf": 26,
|
112 |
+
"sequin": 45,
|
113 |
+
"shirt, blouse": 1,
|
114 |
+
"shoe": 24,
|
115 |
+
"shorts": 8,
|
116 |
+
"skirt": 9,
|
117 |
+
"sleeve": 32,
|
118 |
+
"sock": 23,
|
119 |
+
"sweater": 3,
|
120 |
+
"tassel": 46,
|
121 |
+
"tie": 17,
|
122 |
+
"tights, stockings": 22,
|
123 |
+
"top, t-shirt, sweatshirt": 2,
|
124 |
+
"umbrella": 27,
|
125 |
+
"unlabelled": 0,
|
126 |
+
"vest": 6,
|
127 |
+
"watch": 19,
|
128 |
+
"zipper": 36
|
129 |
+
},
|
130 |
+
"layer_norm_eps": 1e-06,
|
131 |
+
"mlp_ratios": [
|
132 |
+
4,
|
133 |
+
4,
|
134 |
+
4,
|
135 |
+
4
|
136 |
+
],
|
137 |
+
"model_type": "segformer",
|
138 |
+
"num_attention_heads": [
|
139 |
+
1,
|
140 |
+
2,
|
141 |
+
5,
|
142 |
+
8
|
143 |
+
],
|
144 |
+
"num_channels": 3,
|
145 |
+
"num_encoder_blocks": 4,
|
146 |
+
"patch_sizes": [
|
147 |
+
7,
|
148 |
+
3,
|
149 |
+
3,
|
150 |
+
3
|
151 |
+
],
|
152 |
+
"reshape_last_stage": true,
|
153 |
+
"semantic_loss_ignore_index": 255,
|
154 |
+
"sr_ratios": [
|
155 |
+
8,
|
156 |
+
4,
|
157 |
+
2,
|
158 |
+
1
|
159 |
+
],
|
160 |
+
"strides": [
|
161 |
+
4,
|
162 |
+
2,
|
163 |
+
2,
|
164 |
+
2
|
165 |
+
],
|
166 |
+
"torch_dtype": "float32",
|
167 |
+
"transformers_version": "4.30.0"
|
168 |
+
}
|
ComfyUI/models/segformer_b3_fashion/model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f3f5b30179f1480d329224d089f6d286580142c2b12846d08de814a48a81f42f
|
3 |
+
size 189118204
|
ComfyUI/models/segformer_b3_fashion/preprocessor_config.json
ADDED
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"do_normalize": true,
|
3 |
+
"do_reduce_labels": false,
|
4 |
+
"do_rescale": true,
|
5 |
+
"do_resize": true,
|
6 |
+
"image_mean": [
|
7 |
+
0.485,
|
8 |
+
0.456,
|
9 |
+
0.406
|
10 |
+
],
|
11 |
+
"image_processor_type": "SegformerImageProcessor",
|
12 |
+
"image_std": [
|
13 |
+
0.229,
|
14 |
+
0.224,
|
15 |
+
0.225
|
16 |
+
],
|
17 |
+
"resample": 2,
|
18 |
+
"rescale_factor": 0.00392156862745098,
|
19 |
+
"size": {
|
20 |
+
"height": 512,
|
21 |
+
"width": 512
|
22 |
+
}
|
23 |
+
}
|
ComfyUI/models/segformer_b3_fashion/pytorch_model_2.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ec5749e86e5efad5d9dbbf7c2e4b996d675548dc22f26b06c0f1b6fc2e8bc1e2
|
3 |
+
size 189264154
|
ComfyUI/models/segformer_b3_fashion/training_args.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f871f7bdbc3af72746e7b76beb628f6365db08040b58b3071238dca986de97ca
|
3 |
+
size 4408
|