CaterinaLac
commited on
Commit
•
a97ff56
1
Parent(s):
e7eaab7
Update README.md
Browse files
README.md
CHANGED
@@ -4,4 +4,8 @@ datasets:
|
|
4 |
- shibing624/sharegpt_gpt4
|
5 |
language:
|
6 |
- en
|
7 |
-
---
|
|
|
|
|
|
|
|
|
|
4 |
- shibing624/sharegpt_gpt4
|
5 |
language:
|
6 |
- en
|
7 |
+
---
|
8 |
+
|
9 |
+
Pythia-70m-deduped finetuned on a cleaned version of ShareGPT data.
|
10 |
+
The cleaned dataset is obtained by removing duplicates and paraphrases from the original corpus, and keeping only the English instance.
|
11 |
+
The final training size is of 3507 instances.
|