Doctor-Shotgun
/

smol_llama-220M-GQA-32k-theta-sft-limarp

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Doctor-Shotgun commited on Dec 25, 2023

Commit

4488107

•

1 Parent(s): 9f6103d

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ tags:
 Experimental model meant to serve as a long-context speculative decoding model. This one is specifically for models trained on the LimaRP prompt format.
-Created using [Doctor-Shotgun/smol_llama-220M-GQA-32k-theta-sft](https://huggingface.co/Doctor-Shotgun/smol_llama-220M-GQA-32k-theta-sft) and finetuning at 32768 context length on several instruction datasets.
 This variant uses the rope theta (rope frequency base) method for context extension.

 Experimental model meant to serve as a long-context speculative decoding model. This one is specifically for models trained on the LimaRP prompt format.
+Created using [Doctor-Shotgun/smol_llama-220M-GQA-32k-theta-sft](https://huggingface.co/Doctor-Shotgun/smol_llama-220M-GQA-32k-theta-sft) and finetuning at 32768 context length on the LimaRP dataset.
 This variant uses the rope theta (rope frequency base) method for context extension.