Model description

This model is based on the naver-clova-ix/donut-base model. The training dataset is created by manually scrapping images across the internet

Usage & limitations

The model could be used to detect the nutritional facts or compositions from images of food or drug packages. It is capable to create a json format of the components described in the image. However, due to lack of data, the texts in the image must be concisely upright.

Output Example

Model Output :

'<s_kmpsi><s_komposisi><s_obat>Vitamin E</s_obat><s_takaran>30 I.U.</s_takaran><sep/><s_obat>Tiamin HCl (B1)</s_obat><s_takaran>100 mg</s_takaran><sep/><s_obat>Piridoksin HCl (B6)</s_obat><s_takaran>50 mg</s_takaran><sep/><s_obat>Sianokobalamin (B12)</s_obat><s_takaran>100 mcg</s_takaran><sep/><s_obat>K-l-aspartat</s_obat><s_takaran>100 mg</s_takaran><sep/><s_obat>Mg-l-aspartat</s_obat><s_takaran>100 mg</s_takaran></s_komposisi><s_desc></s_desc></s_kmpsi>'

Json Parsed Output :

{'komposisi': [{'obat': 'Vitamin E', 'takaran': '30 I.U.'}, {'obat': 'Tiamin HCl (B1)', 'takaran': '100 mg'}, {'obat': 'Piridoksin HCl (B6)', 'takaran': '50 mg'}, {'obat': 'Sianokobalamin (B12)', 'takaran': '100 mcg'}, {'obat': 'K-l-aspartat', 'takaran': '100 mg'}, {'obat': 'Mg-l-aspartat', 'takaran': '100 mg'}], 'desc': ''}

How to use

Load Donut Processor and Model

from transformers import DonutProcessor, VisionEncoderDecoderModel

# Load processor
processor = DonutProcessor.from_pretrained("jonathanjordan21/donut_fine_tuning_food_composition_id")

# Load model
model = VisionEncoderDecoderModel.from_pretrained("jonathanjordan21/donut_fine_tuning_food_composition_id")

Create JSON parser

from PIL import Image
from io import BytesIO
import re

import torch

def get_komposisi(image_path, image=None):

    device = "cuda" if torch.cuda.is_available() else "cpu"

    image = Image.open(image_path).convert('RGB') if image== None else image.convert('RGB')

    task_prompt = "<s_kmpsi>"
    decoder_input_ids = processor.tokenizer(task_prompt, add_special_tokens=False, return_tensors="pt").input_ids

    pixel_values = processor(image, return_tensors="pt").pixel_values

    outputs = model.generate(
        pixel_values.to(device),
        decoder_input_ids=decoder_input_ids.to(device),
        max_length=model.decoder.config.max_position_embeddings,
        early_stopping=True,
        pad_token_id=processor.tokenizer.pad_token_id,
        eos_token_id=processor.tokenizer.eos_token_id,
        use_cache=True,
        bad_words_ids=[[processor.tokenizer.unk_token_id]],
        return_dict_in_generate=True,
    )

    sequence1 = processor.batch_decode(outputs.sequences)[0]
    sequence2 = sequence1.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
    sequence3 = re.sub(r"<.*?>", "", sequence2, count=1).strip()  # remove first task start token

    return processor.token2json(sequence3)

Get json output from an image

import requests

image = requests.get('https://pintarjualan.id/wp-content/uploads/sites/2/2022/04/label-nustrisi-fact-1.png').content
print(get_komposisi("", Image.open(BytesIO(image))))
Downloads last month
11
Safetensors
Model size
201M params
Tensor type
I64
ยท
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using jonathanjordan21/donut_fine_tuning_food_composition_id 1