Update README.md
Browse files
README.md
CHANGED
@@ -60,7 +60,7 @@ The model was trained on the Pile, an 800Gb dataset composed of varied web corpo
|
|
60 |
|
61 |
This model was trained for 47,000 steps at a batch size of 6,291,456 tokens per step in the [GPT-NeoX codebase](https://github.com/EleutherAI/gpt-neox). It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
|
62 |
|
63 |
-
Following Bavarian et al. 2022, we train the model to additionally perform infilling via a data transformation applied randomly to 90% of input contexts at train-time.
|
64 |
|
65 |
Middle segments “to infill” were selected uniformly at random from contexts at the character level, and these contexts were then reformatted as
|
66 |
|
@@ -144,7 +144,7 @@ We evaluate our model on a number of standard NLP datasets to verify that our in
|
|
144 |
|
145 |
We use the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) library developed by EleutherAI for all evaluations except for HumanEval-infilling, for which we use the code in [https://github.com/openai/human-eval-infilling](https://github.com/openai/human-eval-infilling) to evaluate performance.
|
146 |
|
147 |
-
All 3 models here are trained using the same configuration with differing FIM hyperparameters and/or different positional embeddings. "AR-1.3B" refers to a model trained without FIM and with rotary positional embeddings, "CarperAI/FIM-NeoX-1.3B" refers to this model (trained with a FIM rate of 0.9 in SPM mode according to Bavarian et al. 2022), and "FIM-1.3B-alibi" refers to a model trained with [AliBi](https://arxiv.org/abs/2108.12409) positional embeddings but otherwise the same as this model.
|
148 |
|
149 |
| Model | HumanEval-infilling | arc\_easy | arc\_challenge | lambada | piqa | sciq | wsc | winogrande |
|
150 |
|-----------------|---------------------|----------|---------------|---------|--------|-------|--------|------------|
|
|
|
60 |
|
61 |
This model was trained for 47,000 steps at a batch size of 6,291,456 tokens per step in the [GPT-NeoX codebase](https://github.com/EleutherAI/gpt-neox). It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
|
62 |
|
63 |
+
Following [Bavarian et al. 2022](https://arxiv.org/abs/2207.14255), we train the model to additionally perform infilling via a data transformation applied randomly to 90% of input contexts at train-time.
|
64 |
|
65 |
Middle segments “to infill” were selected uniformly at random from contexts at the character level, and these contexts were then reformatted as
|
66 |
|
|
|
144 |
|
145 |
We use the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) library developed by EleutherAI for all evaluations except for HumanEval-infilling, for which we use the code in [https://github.com/openai/human-eval-infilling](https://github.com/openai/human-eval-infilling) to evaluate performance.
|
146 |
|
147 |
+
All 3 models here are trained using the same configuration with differing FIM hyperparameters and/or different positional embeddings. "AR-1.3B" refers to a model trained without FIM and with rotary positional embeddings, "CarperAI/FIM-NeoX-1.3B" refers to this model (trained with a FIM rate of 0.9 in SPM mode according to [Bavarian et al. 2022](https://arxiv.org/abs/2207.14255)), and "FIM-1.3B-alibi" refers to a model trained with [AliBi](https://arxiv.org/abs/2108.12409) positional embeddings but otherwise the same as this model.
|
148 |
|
149 |
| Model | HumanEval-infilling | arc\_easy | arc\_challenge | lambada | piqa | sciq | wsc | winogrande |
|
150 |
|-----------------|---------------------|----------|---------------|---------|--------|-------|--------|------------|
|