noahkim's picture
Update README.md
6801730
|
raw
history blame
2.23 kB
metadata
language: ko
tags:
  - summarization
  - bigbird
  - bart
inference: false

<<20220917 Commit>>

๊ฐœ์ธ ์Šคํ„ฐ๋””์šฉ์œผ๋กœ ๊ธด ๋ฌธ์žฅ(๋‰ด์Šค ๋“ฑ)์˜ ์š”์•ฝ ๋ชจ๋ธ ํŠนํ™”๋œ ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด BERT๊ธฐ๋ฐ˜์˜ KoBigBird ๋ชจ๋ธ์„ Encoder Decoder๋กœ ๋ณ€ํ™˜ํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ monologg๋‹˜์˜ KoBigBird๋Š” BERT๊ธฐ๋ฐ˜์œผ๋กœ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ์ž๋ž‘ํ•˜์ง€๋งŒ ์ƒ์„ฑ ์š”์•ฝ ๋ถ€๋ถ„์— ์žˆ์–ด์„œ๋Š” Decoder๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์ถ”๊ฐ€์ ์œผ๋กœ Decoder๋ฅผ ์ถ”๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค.

๋งŒ๋“ค์—ˆ๋˜ ์ดˆ๊ธฐ ๋ชจ๋ธ์€ KoBigBird์˜ Encoder๋ฅผ Decoder๋กœ ํ™œ์šฉํ•˜์—ฌ ๋งŒ๋“œ์—ˆ์Šต๋‹ˆ๋‹ค๋งŒ, ์ž์ž˜ํ•œ ์˜ค๋ฅ˜๋กœ ์ธํ•˜์—ฌ monologg๋‹˜์˜ KoBigBird-bert-base์˜ Encoder ๋ถ€๋ถ„๊ณผ ainize๋‹˜์˜ KoBART-news์˜ Decoder๋ฅผ ์ด์–ด์„œ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. config ์ˆ˜์ • ๋“ฑ hyper-parameter finetuned ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ daekeun-ml๋‹˜์ด ์ œ๊ณตํ•ด์ฃผ์‹  naver-news-summarization-ko ๋ฐ์ดํ„ฐ์…‹์„ ํ™œ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

์ดํ›„ AIํ—ˆ๋ธŒ์—์„œ ์ œ๊ณตํ•˜๋Š” ์š”์•ฝ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ์ถ”๊ฐ€ ํ•™์Šต ์ง„ํ–‰ ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.

์„ฑ๋Šฅ๋„ ๋งŽ์ด ์•ˆ์ข‹๊ณ  ์ด์ƒํ•˜์ง€๋งŒ, ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ์— ๋Œ€ํ•ด์„œ ๊ด€์‹ฌ๋„ ์žˆ๊ณ  ์ œ๋Œ€๋กœ ํ™œ์šฉํ•˜๊ณ  ์‹ถ์–ด ์Šค์Šค๋กœ ๋งŒ๋“ค์–ด๋ณด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ง€์†์ ์œผ๋กœ ๋ฐœ์ „์‹œ์ผœ ์ข‹์€ ์„ฑ๋Šฅ์˜ ๋ชจ๋ธ์„ ๊ตฌํ˜„ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.


# Python Code
from transformers import AutoTokenizer
from transformers import AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("noahkim/KoBigBird-KoBart-News-Summarization")
model = AutoModelForSeq2SeqLM.from_pretrained("noahkim/KoBigBird-KoBart-News-Summarization")

@software{jangwon_park_2021_5654154, author = {Jangwon Park and Donggyu Kim}, title = {KoBigBird: Pretrained BigBird Model for Korean}, month = nov, year = 2021, publisher = {Zenodo}, version = {1.0.0}, doi = {10.5281/zenodo.5654154}, url = {https://doi.org/10.5281/zenodo.5654154} }