CaterinaLac
commited on
Commit
•
e7eaab7
1
Parent(s):
6071aa3
Update README.md
Browse filesPythia-70m-deduped finetuned on a cleaned version of ShareGPT data.
The cleaned dataset is obtained by removing duplicates and paraphrases from the original corpus. Furthermore, all the instances that contain multiple languages or that are in any language other than English are discarded from the training set. The final training set has 3507 instances.