metadata

language:
  - ko
pipeline_tag: image-to-text

deplot_kr

deplot_kr is a Image-to-Data(Text) model based on the google's pix2struct architecture. It was fine-tuned from DePlot, using korean chart image-text pairs.

deplot_kr은 google의 pix2struct 구조를 기반으로 한 한국어 image-to-data(텍스트 형태의 데이터 테이블) 모델입니다. DePlot 모델을 한국어 차트 이미지-텍스트 쌍 데이터세트(30만 개)를 이용하여 fine-tuning 했습니다.

How to use

You can run a prediction by input an image.
Model predict the data table of text form in the image.

이미지를 모델에 입력하면 모델은 이미지로부터 표 형태의 데이터 테이블을 예측합니다.

from transformers import Pix2StructForConditionalGeneration, AutoProcessor
from PIL import Image

processor = AutoProcessor.from_pretrained("brainventures/deplot_kr")
model = Pix2StructForConditionalGeneration.from_pretrained("brainventures/deplot_kr")

image_path = "IMAGE_PATH"
image = Image.open(image_path)

inputs = processor(images=image, return_tensors="pt")
pred = model.generate(flattened_patches=flattened_patches, attention_mask=attention_mask, max_length=1024)
print(processor.batch_decode(deplot_generated_ids, skip_special_token=True)[0])

Model Input Image

Model Output - Prediction

대상:
제목: 2011-2021 보건복지 분야 일자리의 증
유형: 단일형 일반 세로 대형
| 보건(천 명) | 복지(천 명)
1분위 | 29.7 | 178.4
2분위 | 70.8 | 97.3
3분위 | 86.4 | 61.3
4분위 | 28.2 | 16.0
5분위 | 52.3 | 0.9

Preprocessing

According to Liu et al.(2023)...

markdown format
| : seperating cells (열 구분)
\n : seperating rows (행 구분)

Train

The model was trained in a TPU environment.

num_warmup_steps : 1,000
num_training_steps : 40,000

Evaluation Results

This model achieves the following results:

metrics name	%
RNSS (Relative Number Set Similarity)	98.1615
RMS (Relative Mapping Similarity) Precision	83.1615
RMS Recall	26.3549
RMS F1 Score	31.5633

Contact

For questions and comments, please use the discussion tab or email gloria@brainventur.com