Doctor-Shotgun
/

smol_llama-220M-GQA-32k-theta-sft

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Doctor-Shotgun commited on Dec 25, 2023

Commit

fa5b78e

•

1 Parent(s): dba9dd5

Create README.md

Files changed (1) hide show

README.md +34 -0

README.md ADDED Viewed

	@@ -0,0 +1,34 @@

+---
+license: apache-2.0
+datasets:
+- cognitivecomputations/WizardLM_evol_instruct_V2_196k_unfiltered_merged_split
+- cognitivecomputations/Code-74k-ShareGPT-Vicuna
+- jondurbin/airoboros-3.1
+- Norquinal/claude_multiround_chat_30k
+- Doctor-Shotgun/no-robots-sharegpt
+language:
+- en
+tags:
+- llama
+- llama 2
+- smol_llama
+---
+# smol_llama-220M-GQA-32k-theta-sft
+Experimental model meant to serve as a long-context speculative decoding model.
+Created using [Doctor-Shotgun/smol_llama-220M-GQA-32k-theta](https://huggingface.co/Doctor-Shotgun/smol_llama-220M-GQA-32k-theta) and finetuning at 32768 context length on several instruction datasets.
+This variant uses the rope theta (rope frequency base) method for context extension.
+The trained instruction format is Alpaca:
+```
+### Instruction:
+{{instruction}}
+### Input:
+{{user input}}
+### Response:
+{{model response}}
+```