Dataset
#8
by
kesarito
- opened
Hi!
Great job with your model! We are trying to build a very similar project.
What dataset have you used? We are considering to use the korean wikipedia.
Hi,
It's glad to see for such a project like this!
I used various corpus from multiple sources,
which includes KcBERT(https://github.com/Beomi/KcBERT/releases/tag/v2022.3Q) and Korean Wikipedia, and AIHub Text data (https://aihub.or.kr/aihubdata/data/list.do?currMenu=115&topMenu=100&srchDataRealmCode=REALM002) and etc.
I hope this links would help too:
- Korean Corpus: https://corpus.korean.go.kr/request/reausetMain.do?lang=ko
beomi
changed discussion status to
closed
ํน์ AIHub text data์ validation split์ด๋ test split์ ํ์ต์ ์ฌ์ฉํ๋์?