ashaduzzaman
/

segformer-b0-scene-parse-150

Image Segmentation

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

segformer-b0-scene-parse-150 / README.md

ashaduzzaman's picture

Update README.md

213b5dc verified about 1 month ago

|

history blame contribute delete

2.67 kB

	---
	library_name: transformers
	license: other
	base_model: nvidia/mit-b0
	tags:
	- generated_from_trainer
	datasets:
	- scene_parse_150
	model-index:
	- name: segformer-b0-scene-parse-150
	results: []
	metrics:
	- mean_iou
	pipeline_tag: image-segmentation
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Segformer-b0-scene-parse-150

	This model is a fine-tuned version of the [nvidia/mit-b0](https://huggingface.co/nvidia/mit-b0) model, specifically trained on the `scene_parse_150` dataset. The goal of this model is to perform semantic segmentation for various scene parsing tasks.

	### Evaluation Results:
	The model achieved the following results on the evaluation dataset:

	- Loss: 1.8435
	- Mean IoU: 0.0881
	- Mean Accuracy: 0.1619
	- Overall Accuracy: 0.6663

	Per-Category IoU and Per-Category Accuracy values are available but sparse, indicating performance variability across different categories.

	## Model Description

	Segformer-b0 is based on a modified version of the Vision Transformer (ViT) architecture, adapted for efficient segmentation tasks. It incorporates hierarchical features to generate high-quality segmentation maps.

	More detailed model descriptions, including architectural adjustments or preprocessing requirements, are needed.

	## Intended Uses & Limitations

	- Use Cases: Suitable for scene parsing and segmentation tasks in environments with diverse visual categories.
	- Limitations: Performance varies significantly between categories, as seen from sparse accuracy and IoU metrics. The model may struggle with underrepresented classes or categories with fewer visual distinctions.
	- Further details on intended domains and limitations are needed.

	## Training and Evaluation Data

	The model was trained on the `scene_parse_150` dataset, which consists of diverse visual scenes with 150 unique semantic categories. Further information on dataset specifics and any preprocessing steps is needed.

	## Training Procedure

	### Hyperparameters:
	- Learning Rate: 6e-05
	- Training Batch Size: 2
	- Evaluation Batch Size: 2
	- Seed: 42
	- Optimizer: Adam (betas=(0.9, 0.999), epsilon=1e-08)
	- Learning Rate Scheduler: Linear
	- Number of Epochs: 50

	### Training Results:
	The model was trained over 50 epochs, but further details regarding its convergence behavior, training duration, and hardware environment could provide additional insights.

	## Framework Versions:
	- Transformers 4.44.2
	- PyTorch 2.4.0+cu121
	- Datasets 2.21.0
	- Tokenizers 0.19.1