File size: 849 Bytes
fa5b78e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
---
datasets:
- cognitivecomputations/WizardLM_evol_instruct_V2_196k_unfiltered_merged_split
- cognitivecomputations/Code-74k-ShareGPT-Vicuna
- jondurbin/airoboros-3.1
- Norquinal/claude_multiround_chat_30k
- Doctor-Shotgun/no-robots-sharegpt
language:
- en
tags:
- llama
- llama 2
- smol_llama
---
# smol_llama-220M-GQA-32k-theta-sft

Experimental model meant to serve as a long-context speculative decoding model.

Created using [Doctor-Shotgun/smol_llama-220M-GQA-32k-theta](https://huggingface.co/Doctor-Shotgun/smol_llama-220M-GQA-32k-theta) and finetuning at 32768 context length on several instruction datasets.

This variant uses the rope theta (rope frequency base) method for context extension.

The trained instruction format is Alpaca:
```
### Instruction:
{{instruction}}

### Input:
{{user input}}

### Response:
{{model response}}
```