Vision Transformer (ViT) for Music Genre Classification
Model Overview
Model Name: ghermoso/vit-eGTZANplus
Task: Image Classification
Dataset: egtzan_plus
Model Architecture: Vision Transformer (ViT)
Finetuned from model: This model is a fine-tuned version of google/vit-base-patch16-224-in21k on an egtzan_plus dataset.
It achieves the following results on the evaluation set:
- Loss: 0.8358
- Accuracy: 0.7460
- Downloads last month
- 197
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for ghermoso/vit-eGTZANplus
Base model
google/vit-base-patch16-224-in21k