KoGPT
KakaoBrain's Pre-Trained Language Models.
- KoGPT (Korean Generative Pre-trained Transformer)
Model Descriptions
KoGPT6B-ryan1.5b
- [huggingface][kakaobrain/kogpt][KoGPT6B-ryan1.5b]
- [huggingface][kakaobrain/kogpt][KoGPT6B-ryan1.5b-float16]
Hyperparameter | Value |
---|---|
6,166,502,400 | |
28 | |
4,096 | |
16,384 | |
16 | |
256 | |
2,048 | |
64,512 | |
Positional Encoding | Rotary Position Embedding (RoPE) |
RoPE Dimensions | 64 |
Hardware requirements
KoGPT6B-ryan1.5b
GPU
The following is the recommended minimum GPU hardware guidance for a handful of example KoGPT.
32GB GPU RAM
in the required minimum memory size
KoGPT6B-ryan1.5b-float16
GPU
The following is the recommended minimum GPU hardware guidance for a handful of example KoGPT.
- half-precision requires NVIDIA GPUS based on Volta, Turing or Ampere
16GB GPU RAM
in the required minimum memory size
Usage
prompt
python -m kogpt --help
usage: KoGPT inference [-h] [--model MODEL] [--revision {KoGPT6B-ryan1.5b}]
[--device {cpu,cuda}] [-d]
KakaoBrain Korean(hangul) Generative Pre-Training Model
optional arguments:
-h, --help show this help message and exit
--model MODEL huggingface repo (default:kakaobrain/kogpt)
--revision {KoGPT6B-ryan1.5b}
--device {cpu,cuda} (default:cuda)
-d, --debug
python -m kogpt
prompt> μΈκ°μ²λΌ μκ°νκ³ , νλνλ 'μ§λ₯'μ ν΅ν΄ μΈλ₯κ° μ΄μ κΉμ§ νμ§ λͺ»νλ
temperature(0.8)>
max_length(128)> 64
μΈκ°μ²λΌ μκ°νκ³ , νλνλ 'μ§λ₯'μ ν΅ν΄ μΈλ₯κ° μ΄μ κΉμ§ νμ§ λͺ»νλ λ¬Έμ μ ν΄λ΅μ μ°Ύμ μ μμ κ²μ΄λ€. κ³ΌνκΈ°μ μ΄ κ³ λλ‘ λ°λ¬ν 21μΈκΈ°λ₯Ό μ΄μκ° μ°λ¦¬ μμ΄λ€μκ² κ°μ₯ νμν κ²μ μ¬κ³ λ ₯ νλ ¨μ΄λ€. μ¬κ³ λ ₯ νλ ¨μ ν΅ν΄, μΈμ
prompt>
...
python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained(
'kakaobrain/kogpt', revision='KoGPT6B-ryan1.5b-float16', # or float32 version: revision=KoGPT6B-ryan1.5b
bos_token='[BOS]', eos_token='[EOS]', unk_token='[UNK]', pad_token='[PAD]', mask_token='[MASK]'
)
model = AutoModelForCausalLM.from_pretrained(
'kakaobrain/kogpt', revision='KoGPT6B-ryan1.5b-float16', # or float32 version: revision=KoGPT6B-ryan1.5b
pad_token_id=tokenizer.eos_token_id,
torch_dtype='auto', low_cpu_mem_usage=True
).to(device='cuda', non_blocking=True)
_ = model.eval()
prompt = 'μΈκ°μ²λΌ μκ°νκ³ , νλνλ \'μ§λ₯\'μ ν΅ν΄ μΈλ₯κ° μ΄μ κΉμ§ νμ§ λͺ»νλ'
with torch.no_grad():
tokens = tokenizer.encode(prompt, return_tensors='pt').to(device='cuda', non_blocking=True)
gen_tokens = model.generate(tokens, do_sample=True, temperature=0.8, max_length=64)
generated = tokenizer.batch_decode(gen_tokens)[0]
print(generated) # print: μΈκ°μ²λΌ μκ°νκ³ , νλνλ 'μ§λ₯'μ ν΅ν΄ μΈλ₯κ° μ΄μ κΉμ§ νμ§ λͺ»νλ λ¬Έμ μ ν΄λ΅μ μ°Ύμ μ μμ κ²μ΄λ€. κ³ΌνκΈ°μ μ΄ κ³ λλ‘ λ°λ¬ν 21μΈκΈ°λ₯Ό μ΄μκ° μ°λ¦¬ μμ΄λ€μκ² κ°μ₯ νμν κ²μ μ¬κ³ λ ₯ νλ ¨μ΄λ€. μ¬κ³ λ ₯ νλ ¨μ ν΅ν΄, μΈμ
Experiments
In-context Few-Shots
Models | #params | NSMC (Acc.) | YNAT (F1) | KLUE-STS (F1) |
---|---|---|---|---|
HyperCLOVA[1] | 1.3B | 83.9 | 58.7 | 60.9 |
HyperCLOVA[1] | 6.9B | 83.8 | 67.5 | 59.3 |
HyperCLOVA[1] | 13.0B | 87.9 | 67.9 | 60.0 |
HyperCLOVA[1] | 39.0B | 88.0 | 71.4 | 61.6 |
HyperCLOVA[1] | 82.0B | 88.2 | 72.7 | 65.1 |
Ours | 6.0B | 87.8 | 78.0 | 64.3 |
Finetuning / P-Tuning
We have been reported to have issues(https://github.com/kakaobrain/kogpt/issues/17) with our downstream evaluation.
The previously published performance evaluation table was deleted because it was difficult to see it as a fair comparison because the comparison target algorithm was different and the performance measurement method could not be confirmed.
You can refer to the above issue link for the existing performance evaluation table and troubleshooting results.
Limitations
KakaoBrain KoGPT
was trained on rayn dataset
, a dataset known to contain profanity, lewd, political changed, and other harsh language.
Therefore, KoGPT
can generate socially unacceptable texts. As with all language models, It is difficult to predict in advance how KoGPT
will response to particular prompts and offensive content without warning.
Primarily Korean: KoGPT
is primarily trained on Korean texts, and is best for classifying, searching, summarizing or generating such texts.
KoGPT
by default perform worse on inputs that are different from the data distribution it is trained on, including non-Korean as well as specific dialects of Korean that are not well represented in the training data.
μΉ΄μΉ΄μ€λΈλ μΈ KoGPT
λ μμ€, μλ, μ μΉμ λ΄μ© λ° κΈ°ν κ±°μΉ μΈμ΄μ λν μ²λ¦¬λ₯Ό νμ§ μμ rayn dataset
μΌλ‘ νμ΅νμμ΅λλ€.
λ°λΌμ KoGPT
λ μ¬νμ μΌλ‘ μ©μΈλμ§ μμ ν
μ€νΈλ₯Ό μμ±ν μ μμ΅λλ€. λ€λ₯Έ μΈμ΄ λͺ¨λΈκ³Ό λ§μ°¬κ°μ§λ‘ νΉμ ν둬ννΈμ 곡격μ μΈ μ½ν
μΈ μ μ΄λ ν κ²°κ³Όλ₯Ό μμ±ν μ§ μ¬μ μ νμ
νκΈ° μ΄λ ΅μ΅λλ€.
KoGPT
λ μ£Όλ‘ νκ΅μ΄ ν
μ€νΈλ‘ νμ΅μ νμμΌλ©° μ΄λ¬ν ν
μ€νΈλ₯Ό λΆλ₯, κ²μ, μμ½ λλ μμ±νλλ° κ°μ₯ μ ν©ν©λλ€.
κΈ°λ³Έμ μΌλ‘ KoGPT
λ νμ΅ λ°μ΄ν°μ μ λνλμ§ μλ λ°©μΈλΏλ§μλλΌ νκ΅μ΄κ° μλ κ²½μ°μ κ°μ΄ νμ΅ λ°μ΄ν°μμ λ°κ²¬νκΈ° μ΄λ €μ΄ μ
λ ₯μμ μ’μ§ μμ μ±λ₯μ 보μ
λλ€.
Citation
If you apply this library or model to any project and research, please cite our code:
@misc{kakaobrain2021kogpt,
title = {KoGPT: KakaoBrain Korean(hangul) Generative Pre-trained Transformer},
author = {Ildoo Kim and Gunsoo Han and Jiyeon Ham and Woonhyuk Baek},
year = {2021},
howpublished = {\url{https://github.com/kakaobrain/kogpt}},
}
Contact
This is released as an open source in the hope that it will be helpful to many research institutes and startups for research purposes. We look forward to contacting us from various places who wish to cooperate with us.
License
The source code
of KakaoBrain KoGPT
are licensed under Apache 2.0 License.
The pretrained wieghts
of KakaoBrain KoGPT
are licensed under CC-BY-NC-ND 4.0 License License.
μΉ΄μΉ΄μ€λΈλ μΈ KoGPT
μ μμ€μ½λ(source code)
λ Apache 2.0 λΌμ΄μ μ€ νμ 곡κ°λμ΄ μμ΅λλ€.
μΉ΄μΉ΄μ€λΈλ μΈ KoGPT
μ μ¬μ νμ΅λ κ°μ€μΉ(pretrained weights)
λ CC-BY-NC-ND 4.0 λΌμ΄μ μ€ λΌμ΄μ μ€ νμ 곡κ°λμ΄ μμ΅λλ€.
λͺ¨λΈ λ° μ½λ, μ¬μ νμ΅λ κ°μ€μΉλ₯Ό μ¬μ©ν κ²½μ° λΌμ΄μ μ€ λ΄μ©μ μ€μν΄ μ£Όμμμ€. λΌμ΄μ μ€ μ λ¬Έμ Apache 2.0, LICENSE.cc-by-nc-nd-4.0 νμΌμμ νμΈνμ€ μ μμ΅λλ€.
References
[1] HyperCLOVA: Kim, Boseop, et al. "What changes can large-scale language models bring? intensive study on hyperclova: Billions-scale korean generative pretrained transformers." arXiv preprint arXiv:2109.04650 (2021).
- Downloads last month
- 12