seonghyeonye
commited on
Commit
•
c7a3ebd
1
Parent(s):
d633172
Update README.md
Browse files
README.md
CHANGED
@@ -34,7 +34,7 @@ We also provide a quick [Jupyter Notebook](https://github.com/seonghyeonye/Flipp
|
|
34 |
**Note: the model was trained with bfloat16 activations. As such, we highly discourage running inference with fp16.**
|
35 |
|
36 |
# Training procedure
|
37 |
-
FLIPPED models are based on [T5](https://huggingface.co/google/t5-v1_1-
|
38 |
At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelihood loss in order not to make model produce the proper instruction in that case. Here are our training details.
|
39 |
Training details:
|
40 |
- Fine-tuning steps: 5'000
|
|
|
34 |
**Note: the model was trained with bfloat16 activations. As such, we highly discourage running inference with fp16.**
|
35 |
|
36 |
# Training procedure
|
37 |
+
FLIPPED models are based on [T5](https://huggingface.co/google/t5-v1_1-xxl), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4).
|
38 |
At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelihood loss in order not to make model produce the proper instruction in that case. Here are our training details.
|
39 |
Training details:
|
40 |
- Fine-tuning steps: 5'000
|