Envoid
/

Llama-3.05-NT-Storybreaker-Ministral-70B

Not-For-All-Audiences

Model card Files Files and versions Community

Envoid commited on Oct 24, 2024

Commit

3d191c7

•

1 Parent(s): 85aaaa9

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -44,7 +44,7 @@ Using [qlora-pipe](https://github.com/tdrussell/qlora-pipe) I ran a qlora on Nem
 The atypically high dropout rate was chosen after some unreleased experimentation inspired by the Arxiv paper: [Fine-tuning with Very Large Dropout (Jianyu Zhang, Léon Bottou)](https://arxiv.org/abs/2403.00946)
-Which prescribes the use of a very high dropout rate (0.9 in their case) as a method of improving out-of-domain performance. Further discussion on various internet spaces regarding high dropout training lead to a recommendation of  0.6 as the ideal dropout rate for optimal fitting during finetuning.
 # Merging

 The atypically high dropout rate was chosen after some unreleased experimentation inspired by the Arxiv paper: [Fine-tuning with Very Large Dropout (Jianyu Zhang, Léon Bottou)](https://arxiv.org/abs/2403.00946)
+Which prescribes the use of a very high dropout rate (0.9 in their case) as a method of improving out-of-distribution performance. Further discussion on various internet spaces regarding high dropout training lead to a recommendation of  0.6 as the ideal dropout rate for optimal fitting during finetuning.
 # Merging