CaterinaLac commited on
Commit
e7eaab7
1 Parent(s): 6071aa3

Update README.md

Browse files

Pythia-70m-deduped finetuned on a cleaned version of ShareGPT data.
The cleaned dataset is obtained by removing duplicates and paraphrases from the original corpus. Furthermore, all the instances that contain multiple languages or that are in any language other than English are discarded from the training set. The final training set has 3507 instances.

Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -1,3 +1,7 @@
1
  ---
2
  license: apache-2.0
3
- ---
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - shibing624/sharegpt_gpt4
5
+ language:
6
+ - en
7
+ ---