license: apache-2.0 | |
datasets: | |
- shibing624/sharegpt_gpt4 | |
language: | |
- en | |
Pythia-70m-deduped finetuned on a cleaned version of ShareGPT data. | |
The cleaned dataset is obtained by removing duplicates and paraphrases from the original corpus, and keeping only the English instance. | |
The final training size is of 3507 instances. |