EleutherAI
/

polyglot-ko-5.8b

@@ -73,14 +73,16 @@ model = AutoModelForCausalLM.from_pretrained("EleutherAI/polyglot-ko-5.8b")
 ## Evaluation results
-We evaluate Polyglot-Ko-5.8B on [KOBEST dataset](https://arxiv.org/abs/2204.04541), a benchmark with 5 downstream tasks, against comparable models such as skt/ko-gpt-trinity-1.2B-v0.5, kakaobrain/kogpt and facebook/xglm-7.5B, using the prompts provided in the paper.
 The following tables show the results when the number of few-shot examples differ. You can reproduce these results using the [polyglot branch of lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/polyglot) and the following scripts. For a fair comparison, all models were run under the same conditions and using the same prompts. In the tables, `n` refers to the number of few-shot examples.
 ```console
 python main.py \
    --model gpt2 \
-   --model_args pretrained='EleutherAI/polyglot-ko-5.8b' \
    --tasks kobest_copa,kobest_hellaswag \
    --num_fewshot $YOUR_NUM_FEWSHOT \
    --batch_size $YOUR_BATCH_SIZE \
@@ -90,31 +92,73 @@ python main.py \
 ### COPA (F1)
-| Model                                                                                        | params | n=0 | n=5 | n=10 | n=50 |
 |----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
 | [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)          | 1.2B   | 0.6696 | 0.6477 | 0.6419  | 0.6514  |
 | [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                  | 6.0B   | 0.7345 | 0.7287 | 0.7277  | 0.7479  |
 | [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B)                              | 7.5B   | 0.6723 | 0.6731 | 0.6769  | 0.7119  |
 | [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)            | 1.3B   | 0.7196 | 0.7193 | 0.7204  | 0.7206  |
 | [EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b)            | 3.8B   | 0.7595 | 0.7608 | 0.7638  | 0.7788  |
-| **[EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b) (this)** |**5.8B**|**0.7745**|**0.7676**|**0.7775**|**0.7887**|
 | [EleutherAI/polyglot-ko-12.8b](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)          | 12.8B  | 0.7937 | 0.8108 | 0.8037  | 0.8369  |
-<img src="https://user-images.githubusercontent.com/19511788/233820235-6f617932-3b18-4534-be14-8df9e80b8a06.jpg" width="1000px">
 ### HellaSwag (F1)
-| Model                                                                                          | params |n=0 | n=5 | n=10 | n=50 |
-|------------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
-| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)            | 1.2B   | 0.5243 | 0.5272 | 0.5166  | 0.5352  |
-| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                    | 6.0B   | 0.5590 | 0.5833 | 0.5828  | 0.5907  |
-| [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B)                                | 7.5B   | 0.5665 | 0.5689 | 0.5565  | 0.5622  |
-| [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)              | 1.3B   | 0.5247 | 0.5260 | 0.5278  | 0.5427  |
-| [EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b)              | 3.8B   | 0.5707 | 0.5830 | 0.5670  | 0.5787  |
-| **[EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b) (this)**          | **5.8B**   | **0.5976** | **0.5998** | **0.5979**  | **0.6208**  |
-| [EleutherAI/polyglot-ko-12.8b](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)            | 12.8B  | 0.5954 | 0.6306 | 0.6098  | 0.6118  |
-<img src="https://user-images.githubusercontent.com/19511788/233820233-0127983e-4b37-48ce-89e5-51509ed9b1f2.jpg" width="1000px">
 ## Limitations and Biases

 ## Evaluation results
+We evaluate Polyglot-Ko-3.8B on [KOBEST dataset](https://arxiv.org/abs/2204.04541), a benchmark with 5 downstream tasks, against comparable models such as skt/ko-gpt-trinity-1.2B-v0.5, kakaobrain/kogpt and facebook/xglm-7.5B, using the prompts provided in the paper.
 The following tables show the results when the number of few-shot examples differ. You can reproduce these results using the [polyglot branch of lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/polyglot) and the following scripts. For a fair comparison, all models were run under the same conditions and using the same prompts. In the tables, `n` refers to the number of few-shot examples.
+In case of WiC dataset, all models show random performance.
 ```console
 python main.py \
    --model gpt2 \
+   --model_args pretrained='EleutherAI/polyglot-ko-3.8b' \
    --tasks kobest_copa,kobest_hellaswag \
    --num_fewshot $YOUR_NUM_FEWSHOT \
    --batch_size $YOUR_BATCH_SIZE \
 ### COPA (F1)
+| Model                                                                                        | params | n=0    | n=5    | n=10    | n=50    |
 |----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
 | [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)          | 1.2B   | 0.6696 | 0.6477 | 0.6419  | 0.6514  |
 | [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                  | 6.0B   | 0.7345 | 0.7287 | 0.7277  | 0.7479  |
 | [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B)                              | 7.5B   | 0.6723 | 0.6731 | 0.6769  | 0.7119  |
 | [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)            | 1.3B   | 0.7196 | 0.7193 | 0.7204  | 0.7206  |
 | [EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b)            | 3.8B   | 0.7595 | 0.7608 | 0.7638  | 0.7788  |
+| **[EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b) (this)** | **5.8B** | **0.7745** | **0.7676** | **0.7775** | **0.7887** |
 | [EleutherAI/polyglot-ko-12.8b](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)          | 12.8B  | 0.7937 | 0.8108 | 0.8037  | 0.8369  |
+<img src="https://github.com/EleutherAI/polyglot/assets/19511788/d5b49364-aed5-4467-bae2-5a322c8e2ceb" width="800px">
 ### HellaSwag (F1)
+| Model                                                                                        | params | n=0    | n=5    | n=10    | n=50    |
+|----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
+| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)          | 1.2B   | 0.5243 | 0.5272 | 0.5166  | 0.5352  |
+| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                  | 6.0B   | 0.5590 | 0.5833 | 0.5828  | 0.5907  |
+| [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B)                              | 7.5B   | 0.5665 | 0.5689 | 0.5565  | 0.5622  |
+| [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)            | 1.3B   | 0.5247 | 0.5260 | 0.5278  | 0.5427  |
+| [EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b)            | 3.8B   | 0.5707 | 0.5830 | 0.5670  | 0.5787  |
+| **[EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b) (this)** | **5.8B** | **0.5976** | **0.5998** | **0.5979** | **0.6208** |
+| [EleutherAI/polyglot-ko-12.8b](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)          | 12.8B  | 0.5954 | 0.6306 | 0.6098  | 0.6118  |
+<img src="https://github.com/EleutherAI/polyglot/assets/19511788/5acb60ac-161a-4ab3-a296-db4442e08b7f" width="800px">
+### BoolQ (F1)
+| Model                                                                                        | params | n=0    | n=5    | n=10    | n=50    |
+|----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
+| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)          | 1.2B   | 0.3356 | 0.4014 | 0.3640  | 0.3560  |
+| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                  | 6.0B   | 0.4514 | 0.5981 | 0.5499  | 0.5202  |
+| [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B)                              | 7.5B   | 0.4464 | 0.3324 | 0.3324  | 0.3324  |
+| [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)            | 1.3B   | 0.3552 | 0.4751 | 0.4109  | 0.4038  |
+| [EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b)            | 3.8B   | 0.4320 | 0.5263 | 0.4930  | 0.4038  |
+| **[EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b) (this)** | **5.8B** | **0.4356** | **0.5698** | **0.5187** | **0.5236** |
+| [EleutherAI/polyglot-ko-12.8b](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)          | 12.8B  | 0.4818 | 0.6041 | 0.6289  | 0.6448  |
+<img src="https://github.com/EleutherAI/polyglot/assets/19511788/b74c23c0-01f3-4b68-9e10-a48e9aa052ab" width="800px">
+### SentiNeg (F1)
+| Model                                                                                        | params | n=0    | n=5    | n=10    | n=50    |
+|----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
+| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)          | 1.2B   | 0.6065 | 0.6878 | 0.7280  | 0.8413  |
+| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                  | 6.0B   | 0.3747 | 0.8942 | 0.9294  | 0.9698  |
+| [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B)                              | 7.5B   | 0.3578 | 0.4471 | 0.3964  | 0.5271  |
+| [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)            | 1.3B   | 0.6790 | 0.6257 | 0.5514  | 0.7851  |
+| [EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b)            | 3.8B   | 0.4858 | 0.7950 | 0.7320  | 0.7851  |
+| **[EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b) (this)** | **5.8B** | **0.3394** | **0.8841** | **0.8808** | **0.9521** |
+| [EleutherAI/polyglot-ko-12.8b](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)          | 12.8B  | 0.9117 | 0.9015 | 0.9345  | 0.9723  |
+<img src="https://github.com/EleutherAI/polyglot/assets/19511788/95b56b19-d349-4b70-9ff9-94a5560f89ee" width="800px">
+### WiC (F1)
+| Model                                                                                        | params | n=0    | n=5    | n=10    | n=50    |
+|----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
+| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)          | 1.2B   | 0.3290 | 0.4313 | 0.4001  | 0.3621  |
+| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                  | 6.0B   | 0.3526 | 0.4775 | 0.4358  | 0.4061  |
+| [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B)                              | 7.5B   | 0.3280 | 0.4903 | 0.4945  | 0.3656  |
+| [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)            | 1.3B   | 0.3297 | 0.4850 | 0.4650  | 0.3290  |
+| [EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b)            | 3.8B   | 0.3390 | 0.4944 | 0.4203  | 0.3835  |
+| **[EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b) (this)** | **5.8B** | **0.3913** | **0.4688** | **0.4189** | **0.3910** |
+| [EleutherAI/polyglot-ko-12.8b](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)          | 12.8B  | 0.3985 | 0.3683 | 0.3307  | 0.3273  |
+<img src="https://github.com/EleutherAI/polyglot/assets/19511788/4de4a4c3-d7ac-4e04-8b0c-0d533fe88294" width="800px">
 ## Limitations and Biases