CarperAI
/

FIM-NeoX-1.3B

Text Generation

code-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hails commited on Oct 7, 2022

Commit

ce788ba

•

1 Parent(s): c77b412

Update README.md

Files changed (1) hide show

README.md +4 -8

README.md CHANGED Viewed

@@ -112,22 +112,18 @@ As with all language models, it is hard to predict in advance how FIM-1.3B will
 We evaluate our model on a number of standard NLP datasets to verify that our infilling model performs on par with a comparable autoregressive model.
-We use the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) developed by EleutherAI.
-Report:
-LogiQA, PIQA, SciQ, WSC, Winogrande, ARC_challenge, ARC_easy, lambada
-On FIM-1.3B, the comparable autoregressive model,
-| Model           | HumanEval-Infilling | arc_easy | arc_challenge | lambada | piqa   | sciq  | wsc    | winogrande |
 |-----------------|---------------------|----------|---------------|---------|--------|-------|--------|------------|
 | AR-1.3B         | 0.0029              | 0.5816   | 0.2465        | 7.03    | 0.7116 | 0.85  | 0.3654 | 0.5651     |
-| FIM-1.3B-rotary | 0.0155              | 0.5829   | 0.2457        | 7.08    | 0.7029 | 0.861 | 0.3654 | 0.5390     |
 | FIM-1.3B-alibi  | 0.0029              | 0.5589   | 0.25          | 7.49    | 0.6926 | 0.856 | 0.3654 | 0.5406     |
-We also perform preliminary investigation on code generation and infilling capabilities by testing on HumanEval-Infilling [link to github] [Bavarian et al. 2022]

 We evaluate our model on a number of standard NLP datasets to verify that our infilling model performs on par with a comparable autoregressive model.
+We use the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) library developed by EleutherAI for all evaluations except for HumanEval-infilling, for which we use the code in [https://github.com/openai/human-eval-infilling](https://github.com/openai/human-eval-infilling) to evaluate performance.
+All 3 models here are trained using the same configuration with differing FIM hyperparameters and/or different positional embeddings. "AR-1.3B" refers to a model trained without FIM and with rotary positional embeddings, "CarperAI/FIM-NeoX-1.3B" refers to this model (trained with a FIM rate of 0.9 in SPM mode according to Bavarian et al. 2022), and "FIM-1.3B-alibi" refers to a model trained with [AliBi](https://arxiv.org/abs/2108.12409) positional embeddings but otherwise the same as this model.
+| Model           | HumanEval-infilling | arc\_easy | arc\_challenge | lambada | piqa   | sciq  | wsc    | winogrande |
 |-----------------|---------------------|----------|---------------|---------|--------|-------|--------|------------|
 | AR-1.3B         | 0.0029              | 0.5816   | 0.2465        | 7.03    | 0.7116 | 0.85  | 0.3654 | 0.5651     |
+| CarperAI/FIM-NeoX-1.3B | 0.0155              | 0.5829   | 0.2457        | 7.08    | 0.7029 | 0.861 | 0.3654 | 0.5390     |
 | FIM-1.3B-alibi  | 0.0029              | 0.5589   | 0.25          | 7.49    | 0.6926 | 0.856 | 0.3654 | 0.5406     |