language:
- ko
pipeline_tag: image-to-text
deplot_kr
deplot_kr is a Image-to-Data(Text) model based on the google's pix2struct architecture. It was fine-tuned from DePlot, using korean chart image-text pairs.
deplot_kr์ google์ pix2struct ๊ตฌ์กฐ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ํ ํ๊ตญ์ด image-to-data(ํ ์คํธ ํํ์ ๋ฐ์ดํฐ ํ ์ด๋ธ) ๋ชจ๋ธ์ ๋๋ค. DePlot ๋ชจ๋ธ์ ํ๊ตญ์ด ์ฐจํธ ์ด๋ฏธ์ง-ํ ์คํธ ์ ๋ฐ์ดํฐ์ธํธ(30๋ง ๊ฐ)๋ฅผ ์ด์ฉํ์ฌ fine-tuning ํ์ต๋๋ค.
How to use
You can run a prediction by input an image.
Model predict the data table of text form in the image.
์ด๋ฏธ์ง๋ฅผ ๋ชจ๋ธ์ ์ ๋ ฅํ๋ฉด ๋ชจ๋ธ์ ์ด๋ฏธ์ง๋ก๋ถํฐ ํ ํํ์ ๋ฐ์ดํฐ ํ ์ด๋ธ์ ์์ธกํฉ๋๋ค.
from transformers import Pix2StructForConditionalGeneration, AutoProcessor
from PIL import Image
processor = AutoProcessor.from_pretrained("brainventures/deplot_kr")
model = Pix2StructForConditionalGeneration.from_pretrained("brainventures/deplot_kr")
image_path = "IMAGE_PATH"
image = Image.open(image_path)
inputs = processor(images=image, return_tensors="pt")
pred = model.generate(flattened_patches=flattened_patches, attention_mask=attention_mask, max_length=1024)
print(processor.batch_decode(deplot_generated_ids, skip_special_token=True)[0])
Model Output - Prediction
๋์:
์ ๋ชฉ: 2011-2021 ๋ณด๊ฑด๋ณต์ง ๋ถ์ผ ์ผ์๋ฆฌ์ ์ฆ
์ ํ: ๋จ์ผํ ์ผ๋ฐ ์ธ๋ก ๋ํ
| ๋ณด๊ฑด(์ฒ ๋ช
) | ๋ณต์ง(์ฒ ๋ช
)
1๋ถ์ | 29.7 | 178.4
2๋ถ์ | 70.8 | 97.3
3๋ถ์ | 86.4 | 61.3
4๋ถ์ | 28.2 | 16.0
5๋ถ์ | 52.3 | 0.9
Preprocessing
According to Liu et al.(2023)...
- markdown format
- | : seperating cells (์ด ๊ตฌ๋ถ)
- \n : seperating rows (ํ ๊ตฌ๋ถ)
Train
The model was trained in a TPU environment.
- num_warmup_steps : 1,000
- num_training_steps : 40,000
Evaluation Results
This model achieves the following results:
metrics name | % |
---|---|
RNSS (Relative Number Set Similarity) | 98.1615 |
RMS (Relative Mapping Similarity) Precision | 83.1615 |
RMS Recall | 26.3549 |
RMS F1 Score | 31.5633 |
Contact
For questions and comments, please use the discussion tab or email gloria@brainventur.com