Model Card for Model ID

A fine-tune of Google's ViT-384 model for multi-label image classification on tongue images.

Model Details

The model will predict the presence/absence of three features; Cracks, Red Dots and Toothmarks.

Model type: Vision Transformer
Finetuned from model [optional]: https://huggingface.co/google/vit-base-patch16-384

Safetensors

Model size

86.1M params

Tensor type

F32

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.