metadata

library_name: transformers
license: other
base_model: nvidia/mit-b0
tags:
  - generated_from_trainer
datasets:
  - scene_parse_150
model-index:
  - name: segformer-b0-scene-parse-150
    results: []
metrics:
  - mean_iou
pipeline_tag: image-segmentation

Segformer-b0-scene-parse-150

This model is a fine-tuned version of the nvidia/mit-b0 model, specifically trained on the scene_parse_150 dataset. The goal of this model is to perform semantic segmentation for various scene parsing tasks.

Evaluation Results:

The model achieved the following results on the evaluation dataset:

Loss: 1.8435
Mean IoU: 0.0881
Mean Accuracy: 0.1619
Overall Accuracy: 0.6663

Per-Category IoU and Per-Category Accuracy values are available but sparse, indicating performance variability across different categories.

Model Description

Segformer-b0 is based on a modified version of the Vision Transformer (ViT) architecture, adapted for efficient segmentation tasks. It incorporates hierarchical features to generate high-quality segmentation maps.

More detailed model descriptions, including architectural adjustments or preprocessing requirements, are needed.

Intended Uses & Limitations

Use Cases: Suitable for scene parsing and segmentation tasks in environments with diverse visual categories.
Limitations: Performance varies significantly between categories, as seen from sparse accuracy and IoU metrics. The model may struggle with underrepresented classes or categories with fewer visual distinctions.
Further details on intended domains and limitations are needed.

Training and Evaluation Data

The model was trained on the scene_parse_150 dataset, which consists of diverse visual scenes with 150 unique semantic categories. Further information on dataset specifics and any preprocessing steps is needed.

Training Procedure

Hyperparameters:

Learning Rate: 6e-05
Training Batch Size: 2
Evaluation Batch Size: 2
Seed: 42
Optimizer: Adam (betas=(0.9, 0.999), epsilon=1e-08)
Learning Rate Scheduler: Linear
Number of Epochs: 50

Training Results:

The model was trained over 50 epochs, but further details regarding its convergence behavior, training duration, and hardware environment could provide additional insights.

Framework Versions:

Transformers 4.44.2
PyTorch 2.4.0+cu121
Datasets 2.21.0
Tokenizers 0.19.1