CaterinaLac commited on
Commit
a97ff56
1 Parent(s): e7eaab7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -4,4 +4,8 @@ datasets:
4
  - shibing624/sharegpt_gpt4
5
  language:
6
  - en
7
- ---
 
 
 
 
 
4
  - shibing624/sharegpt_gpt4
5
  language:
6
  - en
7
+ ---
8
+
9
+ Pythia-70m-deduped finetuned on a cleaned version of ShareGPT data.
10
+ The cleaned dataset is obtained by removing duplicates and paraphrases from the original corpus, and keeping only the English instance.
11
+ The final training size is of 3507 instances.