Trained a vit model to do classification on anime dataset.

Divided into four categories: head_only, upperbody, knee_level, fullbody

  • head_only head_only_example2.jpg
  • upperbody upperbody_example2.jpg
  • knee_level knee_level_example2.jpg
  • fullbody fullbody_example2.jpg
from datasets import load_dataset
from PIL import Image
from transformers import ViTImageProcessor, ViTForImageClassification, TrainingArguments, Trainer
import torch
import numpy as np
from datasets import load_metric
import os
import shutil

model_name_or_path = 'lrzjason/anime_portrait_vit'
image_processor = ViTImageProcessor.from_pretrained(model_name_or_path)
model = ViTForImageClassification.from_pretrained(model_name_or_path)

input_dir = '/path/to/dir'
file = 'example.jpg'
image = Image.open(os.path.join(input_dir, file))

inputs = image_processor(image, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits

# model predicts one of the 1000 ImageNet classes
predicted_label = logits.argmax(-1).item()
print(f'predicted_label: {model.config.id2label[predicted_label]}')

Using this dataset: https://huggingface.co/datasets/animelover/genshin-impact-images

Downloads last month
13
Safetensors
Model size
85.8M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.