deplot_kr / README.md
dltjwl
Modify : Code and Evaluation Result
c5c962e
metadata
language:
  - ko
pipeline_tag: image-to-text

deplot_kr

deplot_kr is a Image-to-Data(Text) model based on the google's pix2struct architecture. It was fine-tuned from DePlot, using korean chart image-text pairs.

deplot_kr์€ google์˜ pix2struct ๊ตฌ์กฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ํ•œ๊ตญ์–ด image-to-data(ํ…์ŠคํŠธ ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ ํ…Œ์ด๋ธ”) ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. DePlot ๋ชจ๋ธ์„ ํ•œ๊ตญ์–ด ์ฐจํŠธ ์ด๋ฏธ์ง€-ํ…์ŠคํŠธ ์Œ ๋ฐ์ดํ„ฐ์„ธํŠธ(30๋งŒ ๊ฐœ)๋ฅผ ์ด์šฉํ•˜์—ฌ fine-tuning ํ–ˆ์Šต๋‹ˆ๋‹ค.

How to use

You can run a prediction by input an image.
Model predict the data table of text form in the image.

์ด๋ฏธ์ง€๋ฅผ ๋ชจ๋ธ์— ์ž…๋ ฅํ•˜๋ฉด ๋ชจ๋ธ์€ ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ ํ‘œ ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ ํ…Œ์ด๋ธ”์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.

from transformers import Pix2StructForConditionalGeneration, AutoProcessor
from PIL import Image

processor = AutoProcessor.from_pretrained("brainventures/deplot_kr")
model = Pix2StructForConditionalGeneration.from_pretrained("brainventures/deplot_kr")

image_path = "IMAGE_PATH"
image = Image.open(image_path)

inputs = processor(images=image, return_tensors="pt")
pred = model.generate(flattened_patches=flattened_patches, attention_mask=attention_mask, max_length=1024)
print(processor.batch_decode(deplot_generated_ids, skip_special_token=True)[0])

Model Input Image model_input_image

Model Output - Prediction

๋Œ€์ƒ:
์ œ๋ชฉ: 2011-2021 ๋ณด๊ฑด๋ณต์ง€ ๋ถ„์•ผ ์ผ์ž๋ฆฌ์˜ ์ฆ
์œ ํ˜•: ๋‹จ์ผํ˜• ์ผ๋ฐ˜ ์„ธ๋กœ ๋Œ€ํ˜•
| ๋ณด๊ฑด(์ฒœ ๋ช…) | ๋ณต์ง€(์ฒœ ๋ช…)
1๋ถ„์œ„ | 29.7 | 178.4
2๋ถ„์œ„ | 70.8 | 97.3
3๋ถ„์œ„ | 86.4 | 61.3
4๋ถ„์œ„ | 28.2 | 16.0
5๋ถ„์œ„ | 52.3 | 0.9

Preprocessing

According to Liu et al.(2023)...

  • markdown format
  • | : seperating cells (์—ด ๊ตฌ๋ถ„)
  • \n : seperating rows (ํ–‰ ๊ตฌ๋ถ„)

Train

The model was trained in a TPU environment.

  • num_warmup_steps : 1,000
  • num_training_steps : 40,000

Evaluation Results

This model achieves the following results:

metrics name %
RNSS (Relative Number Set Similarity) 98.1615
RMS (Relative Mapping Similarity) Precision 83.1615
RMS Recall 26.3549
RMS F1 Score 31.5633

Contact

For questions and comments, please use the discussion tab or email gloria@brainventur.com