Vision Transformer (ViT) for Music Genre Classification

Model Overview

Model Name: ghermoso/vit-eGTZANplus
Task: Image Classification
Dataset: egtzan_plus
Model Architecture: Vision Transformer (ViT)
Finetuned from model: This model is a fine-tuned version of google/vit-base-patch16-224-in21k on an egtzan_plus dataset.

It achieves the following results on the evaluation set:

Loss: 0.8358
Accuracy: 0.7460

Downloads last month: 197

Inference Providers NEW

This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for ghermoso/vit-eGTZANplus

Base model

google/vit-base-patch16-224-in21k

Finetuned

(1868)

this model

Dataset used to train ghermoso/vit-eGTZANplus

Evaluation results

Metadata error: specify a dataset to view leaderboard