Sao10K
/

L3-8B-Stheno-v3.3-32K

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Sao10K commited on Jun 22, 2024

Commit

bd0b38d

•

1 Parent(s): bd34d85

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -26,7 +26,8 @@ Coherent at 32K Context. Not as good as a natively trained 32K model, but much b
 Relevant Axolotl Configurations:
 <br>-> Taken from [winglian/Llama-3-8b-64k-PoSE](https://huggingface.co/winglian/Llama-3-8b-64k-PoSE)
-<br>\- I tried to find my own configs, but his worked best. 2M Theta had the best loss results during training compared to other values.
 ```
 sequence_len: 8192

 Relevant Axolotl Configurations:
 <br>-> Taken from [winglian/Llama-3-8b-64k-PoSE](https://huggingface.co/winglian/Llama-3-8b-64k-PoSE)
+<br>\- I tried to find my own configs, hours of tinkering but the one he used worked best, so I stuck to it.
+<br>\- 2M Rope Theta had the best loss results during training compared to other values.
 ```
 sequence_len: 8192