CarperAI
/

FIM-NeoX-1.3B

@@ -60,7 +60,7 @@ The model was trained on the Pile, an 800Gb dataset composed of varied web corpo
 This model was trained for 47,000 steps at a batch size of 6,291,456 tokens per step in the [GPT-NeoX codebase](https://github.com/EleutherAI/gpt-neox). It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
-Following Bavarian et al. 2022, we train the model to additionally perform infilling via a data transformation applied randomly to 90% of input contexts at train-time.
 Middle segments “to infill” were selected uniformly at random from contexts at the character level, and these contexts were then reformatted as
@@ -144,7 +144,7 @@ We evaluate our model on a number of standard NLP datasets to verify that our in
 We use the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) library developed by EleutherAI for all evaluations except for HumanEval-infilling, for which we use the code in [https://github.com/openai/human-eval-infilling](https://github.com/openai/human-eval-infilling) to evaluate performance.
-All 3 models here are trained using the same configuration with differing FIM hyperparameters and/or different positional embeddings. "AR-1.3B" refers to a model trained without FIM and with rotary positional embeddings, "CarperAI/FIM-NeoX-1.3B" refers to this model (trained with a FIM rate of 0.9 in SPM mode according to Bavarian et al. 2022), and "FIM-1.3B-alibi" refers to a model trained with [AliBi](https://arxiv.org/abs/2108.12409) positional embeddings but otherwise the same as this model.
 | Model           | HumanEval-infilling | arc\_easy | arc\_challenge | lambada | piqa   | sciq  | wsc    | winogrande |
 |-----------------|---------------------|----------|---------------|---------|--------|-------|--------|------------|

 This model was trained for 47,000 steps at a batch size of 6,291,456 tokens per step in the [GPT-NeoX codebase](https://github.com/EleutherAI/gpt-neox). It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
+Following [Bavarian et al. 2022](https://arxiv.org/abs/2207.14255), we train the model to additionally perform infilling via a data transformation applied randomly to 90% of input contexts at train-time.
 Middle segments “to infill” were selected uniformly at random from contexts at the character level, and these contexts were then reformatted as
 We use the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) library developed by EleutherAI for all evaluations except for HumanEval-infilling, for which we use the code in [https://github.com/openai/human-eval-infilling](https://github.com/openai/human-eval-infilling) to evaluate performance.
+All 3 models here are trained using the same configuration with differing FIM hyperparameters and/or different positional embeddings. "AR-1.3B" refers to a model trained without FIM and with rotary positional embeddings, "CarperAI/FIM-NeoX-1.3B" refers to this model (trained with a FIM rate of 0.9 in SPM mode according to [Bavarian et al. 2022](https://arxiv.org/abs/2207.14255)), and "FIM-1.3B-alibi" refers to a model trained with [AliBi](https://arxiv.org/abs/2108.12409) positional embeddings but otherwise the same as this model.
 | Model           | HumanEval-infilling | arc\_easy | arc\_challenge | lambada | piqa   | sciq  | wsc    | winogrande |
 |-----------------|---------------------|----------|---------------|---------|--------|-------|--------|------------|