seonghyeonye
commited on
Commit
•
d633172
1
Parent(s):
6d785d6
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
**Official repository**: [seonghyeonye/Flipped-Learning](https://github.com/seonghyeonye/Flipped-Learning)
|
2 |
# Model Description
|
3 |
FLIPPED uses a unique meta-learning method to show zero-shot task generalization on classification natural language prompts, outperforming GPT-3 and T0-11B on many tasks with a 4x smaller scale.
|
@@ -27,7 +34,7 @@ We also provide a quick [Jupyter Notebook](https://github.com/seonghyeonye/Flipp
|
|
27 |
**Note: the model was trained with bfloat16 activations. As such, we highly discourage running inference with fp16.**
|
28 |
|
29 |
# Training procedure
|
30 |
-
FLIPPED models are based on [T5](https://huggingface.co/google/t5-v1_1-
|
31 |
At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelihood loss in order not to make model produce the proper instruction in that case. Here are our training details.
|
32 |
Training details:
|
33 |
- Fine-tuning steps: 5'000
|
@@ -43,8 +50,8 @@ Training details:
|
|
43 |
We trained different variants T0 with different mixtures of datasets.
|
44 |
|Model|Training datasets|
|
45 |
|--|--|
|
46 |
-
|FLIPPED|- Multiple-Choice QA: CommonsenseQA, DREAM, QUAIL, QuaRTz, Social IQA, WiQA, Cosmos, QASC, Quarel, SciQ<br>- Sentiment: Amazon, App Reviews, IMDB, Rotten Tomatoes, Yelp<br>- Topic Classification: AG News, DBPedia<br>- Paraphrase Identification: MRPC, PAWS, QQP|
|
47 |
-
|FLIPPED_3B|Same as
|
48 |
We only choose prompts examples that has output lables, which can be found on the dataset page.
|
49 |
|
50 |
# Evaluation data
|
@@ -83,7 +90,7 @@ We evaluate the robustness of models on following datasets with changing the out
|
|
83 |
# BibTeX entry and citation info
|
84 |
```bibtex
|
85 |
@article{ye2022guess,
|
86 |
-
title={Guess the Instruction!
|
87 |
author={Ye, Seonghyeon and Kim, Doyoung and Jang, Joel and Shin, Joongbo and Seo, Minjoon},
|
88 |
journal={arXiv preprint arXiv:2210.02969},
|
89 |
year={2022}
|
|
|
1 |
+
---
|
2 |
+
datasets:
|
3 |
+
- bigscience/P3
|
4 |
+
language: en
|
5 |
+
license: apache-2.0
|
6 |
+
---
|
7 |
+
|
8 |
**Official repository**: [seonghyeonye/Flipped-Learning](https://github.com/seonghyeonye/Flipped-Learning)
|
9 |
# Model Description
|
10 |
FLIPPED uses a unique meta-learning method to show zero-shot task generalization on classification natural language prompts, outperforming GPT-3 and T0-11B on many tasks with a 4x smaller scale.
|
|
|
34 |
**Note: the model was trained with bfloat16 activations. As such, we highly discourage running inference with fp16.**
|
35 |
|
36 |
# Training procedure
|
37 |
+
FLIPPED models are based on [T5](https://huggingface.co/google/t5-v1_1-xl), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4).
|
38 |
At a high level, the input text along with output label is fed to the encoder and the instruction text is produced by the decoder. The model is fine-tuned to autoregressively generate the target. We also feed input text along with a wrong input, adding an unlikelihood loss in order not to make model produce the proper instruction in that case. Here are our training details.
|
39 |
Training details:
|
40 |
- Fine-tuning steps: 5'000
|
|
|
50 |
We trained different variants T0 with different mixtures of datasets.
|
51 |
|Model|Training datasets|
|
52 |
|--|--|
|
53 |
+
|FLIPPED-11B|- Multiple-Choice QA: CommonsenseQA, DREAM, QUAIL, QuaRTz, Social IQA, WiQA, Cosmos, QASC, Quarel, SciQ<br>- Sentiment: Amazon, App Reviews, IMDB, Rotten Tomatoes, Yelp<br>- Topic Classification: AG News, DBPedia<br>- Paraphrase Identification: MRPC, PAWS, QQP|
|
54 |
+
|FLIPPED_3B|Same as FLIPPED-11B|
|
55 |
We only choose prompts examples that has output lables, which can be found on the dataset page.
|
56 |
|
57 |
# Evaluation data
|
|
|
90 |
# BibTeX entry and citation info
|
91 |
```bibtex
|
92 |
@article{ye2022guess,
|
93 |
+
title={Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners},
|
94 |
author={Ye, Seonghyeon and Kim, Doyoung and Jang, Joel and Shin, Joongbo and Seo, Minjoon},
|
95 |
journal={arXiv preprint arXiv:2210.02969},
|
96 |
year={2022}
|