Update README.md
Browse files
README.md
CHANGED
@@ -112,22 +112,18 @@ As with all language models, it is hard to predict in advance how FIM-1.3B will
|
|
112 |
|
113 |
We evaluate our model on a number of standard NLP datasets to verify that our infilling model performs on par with a comparable autoregressive model.
|
114 |
|
115 |
-
We use the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) developed by EleutherAI.
|
116 |
|
|
|
117 |
|
118 |
-
|
119 |
-
LogiQA, PIQA, SciQ, WSC, Winogrande, ARC_challenge, ARC_easy, lambada
|
120 |
-
On FIM-1.3B, the comparable autoregressive model,
|
121 |
-
|
122 |
-
| Model | HumanEval-Infilling | arc_easy | arc_challenge | lambada | piqa | sciq | wsc | winogrande |
|
123 |
|-----------------|---------------------|----------|---------------|---------|--------|-------|--------|------------|
|
124 |
| AR-1.3B | 0.0029 | 0.5816 | 0.2465 | 7.03 | 0.7116 | 0.85 | 0.3654 | 0.5651 |
|
125 |
-
| FIM-1.3B
|
126 |
| FIM-1.3B-alibi | 0.0029 | 0.5589 | 0.25 | 7.49 | 0.6926 | 0.856 | 0.3654 | 0.5406 |
|
127 |
|
128 |
|
129 |
|
130 |
-
We also perform preliminary investigation on code generation and infilling capabilities by testing on HumanEval-Infilling [link to github] [Bavarian et al. 2022]
|
131 |
|
132 |
|
133 |
|
|
|
112 |
|
113 |
We evaluate our model on a number of standard NLP datasets to verify that our infilling model performs on par with a comparable autoregressive model.
|
114 |
|
115 |
+
We use the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) library developed by EleutherAI for all evaluations except for HumanEval-infilling, for which we use the code in [https://github.com/openai/human-eval-infilling](https://github.com/openai/human-eval-infilling) to evaluate performance.
|
116 |
|
117 |
+
All 3 models here are trained using the same configuration with differing FIM hyperparameters and/or different positional embeddings. "AR-1.3B" refers to a model trained without FIM and with rotary positional embeddings, "CarperAI/FIM-NeoX-1.3B" refers to this model (trained with a FIM rate of 0.9 in SPM mode according to Bavarian et al. 2022), and "FIM-1.3B-alibi" refers to a model trained with [AliBi](https://arxiv.org/abs/2108.12409) positional embeddings but otherwise the same as this model.
|
118 |
|
119 |
+
| Model | HumanEval-infilling | arc\_easy | arc\_challenge | lambada | piqa | sciq | wsc | winogrande |
|
|
|
|
|
|
|
|
|
120 |
|-----------------|---------------------|----------|---------------|---------|--------|-------|--------|------------|
|
121 |
| AR-1.3B | 0.0029 | 0.5816 | 0.2465 | 7.03 | 0.7116 | 0.85 | 0.3654 | 0.5651 |
|
122 |
+
| CarperAI/FIM-NeoX-1.3B | 0.0155 | 0.5829 | 0.2457 | 7.08 | 0.7029 | 0.861 | 0.3654 | 0.5390 |
|
123 |
| FIM-1.3B-alibi | 0.0029 | 0.5589 | 0.25 | 7.49 | 0.6926 | 0.856 | 0.3654 | 0.5406 |
|
124 |
|
125 |
|
126 |
|
|
|
127 |
|
128 |
|
129 |
|