File size: 2,669 Bytes
d115d2b
 
 
 
 
 
 
 
 
 
 
213b5dc
 
 
d115d2b
 
 
 
 
213b5dc
d115d2b
213b5dc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d115d2b
213b5dc
d115d2b
213b5dc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
library_name: transformers
license: other
base_model: nvidia/mit-b0
tags:
- generated_from_trainer
datasets:
- scene_parse_150
model-index:
- name: segformer-b0-scene-parse-150
  results: []
metrics:
- mean_iou
pipeline_tag: image-segmentation
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Segformer-b0-scene-parse-150

This model is a fine-tuned version of the [nvidia/mit-b0](https://huggingface.co/nvidia/mit-b0) model, specifically trained on the `scene_parse_150` dataset. The goal of this model is to perform semantic segmentation for various scene parsing tasks. 

### Evaluation Results:
The model achieved the following results on the evaluation dataset:

- **Loss**: 1.8435
- **Mean IoU**: 0.0881
- **Mean Accuracy**: 0.1619
- **Overall Accuracy**: 0.6663

**Per-Category IoU** and **Per-Category Accuracy** values are available but sparse, indicating performance variability across different categories.

## Model Description

Segformer-b0 is based on a modified version of the Vision Transformer (ViT) architecture, adapted for efficient segmentation tasks. It incorporates hierarchical features to generate high-quality segmentation maps. 

More detailed model descriptions, including architectural adjustments or preprocessing requirements, are needed.

## Intended Uses & Limitations

- **Use Cases**: Suitable for scene parsing and segmentation tasks in environments with diverse visual categories.
- **Limitations**: Performance varies significantly between categories, as seen from sparse accuracy and IoU metrics. The model may struggle with underrepresented classes or categories with fewer visual distinctions.
- Further details on intended domains and limitations are needed.

## Training and Evaluation Data

The model was trained on the `scene_parse_150` dataset, which consists of diverse visual scenes with 150 unique semantic categories. Further information on dataset specifics and any preprocessing steps is needed.

## Training Procedure

### Hyperparameters:
- **Learning Rate**: 6e-05
- **Training Batch Size**: 2
- **Evaluation Batch Size**: 2
- **Seed**: 42
- **Optimizer**: Adam (betas=(0.9, 0.999), epsilon=1e-08)
- **Learning Rate Scheduler**: Linear
- **Number of Epochs**: 50

### Training Results:
The model was trained over 50 epochs, but further details regarding its convergence behavior, training duration, and hardware environment could provide additional insights.

## Framework Versions:
- Transformers 4.44.2
- PyTorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1