Update README.md
Browse files
README.md
CHANGED
@@ -44,7 +44,7 @@ Using [qlora-pipe](https://github.com/tdrussell/qlora-pipe) I ran a qlora on Nem
|
|
44 |
|
45 |
The atypically high dropout rate was chosen after some unreleased experimentation inspired by the Arxiv paper: [Fine-tuning with Very Large Dropout (Jianyu Zhang, Léon Bottou)](https://arxiv.org/abs/2403.00946)
|
46 |
|
47 |
-
Which prescribes the use of a very high dropout rate (0.9 in their case) as a method of improving out-of-
|
48 |
|
49 |
# Merging
|
50 |
|
|
|
44 |
|
45 |
The atypically high dropout rate was chosen after some unreleased experimentation inspired by the Arxiv paper: [Fine-tuning with Very Large Dropout (Jianyu Zhang, Léon Bottou)](https://arxiv.org/abs/2403.00946)
|
46 |
|
47 |
+
Which prescribes the use of a very high dropout rate (0.9 in their case) as a method of improving out-of-distribution performance. Further discussion on various internet spaces regarding high dropout training lead to a recommendation of 0.6 as the ideal dropout rate for optimal fitting during finetuning.
|
48 |
|
49 |
# Merging
|
50 |
|