--- library_name: transformers license: other base_model: nvidia/mit-b0 tags: - generated_from_trainer datasets: - scene_parse_150 model-index: - name: segformer-b0-scene-parse-150 results: [] metrics: - mean_iou pipeline_tag: image-segmentation --- # Segformer-b0-scene-parse-150 This model is a fine-tuned version of the [nvidia/mit-b0](https://huggingface.co/nvidia/mit-b0) model, specifically trained on the `scene_parse_150` dataset. The goal of this model is to perform semantic segmentation for various scene parsing tasks. ### Evaluation Results: The model achieved the following results on the evaluation dataset: - **Loss**: 1.8435 - **Mean IoU**: 0.0881 - **Mean Accuracy**: 0.1619 - **Overall Accuracy**: 0.6663 **Per-Category IoU** and **Per-Category Accuracy** values are available but sparse, indicating performance variability across different categories. ## Model Description Segformer-b0 is based on a modified version of the Vision Transformer (ViT) architecture, adapted for efficient segmentation tasks. It incorporates hierarchical features to generate high-quality segmentation maps. More detailed model descriptions, including architectural adjustments or preprocessing requirements, are needed. ## Intended Uses & Limitations - **Use Cases**: Suitable for scene parsing and segmentation tasks in environments with diverse visual categories. - **Limitations**: Performance varies significantly between categories, as seen from sparse accuracy and IoU metrics. The model may struggle with underrepresented classes or categories with fewer visual distinctions. - Further details on intended domains and limitations are needed. ## Training and Evaluation Data The model was trained on the `scene_parse_150` dataset, which consists of diverse visual scenes with 150 unique semantic categories. Further information on dataset specifics and any preprocessing steps is needed. ## Training Procedure ### Hyperparameters: - **Learning Rate**: 6e-05 - **Training Batch Size**: 2 - **Evaluation Batch Size**: 2 - **Seed**: 42 - **Optimizer**: Adam (betas=(0.9, 0.999), epsilon=1e-08) - **Learning Rate Scheduler**: Linear - **Number of Epochs**: 50 ### Training Results: The model was trained over 50 epochs, but further details regarding its convergence behavior, training duration, and hardware environment could provide additional insights. ## Framework Versions: - Transformers 4.44.2 - PyTorch 2.4.0+cu121 - Datasets 2.21.0 - Tokenizers 0.19.1