Doctor-Shotgun
commited on
Commit
•
fa5b78e
1
Parent(s):
dba9dd5
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- cognitivecomputations/WizardLM_evol_instruct_V2_196k_unfiltered_merged_split
|
5 |
+
- cognitivecomputations/Code-74k-ShareGPT-Vicuna
|
6 |
+
- jondurbin/airoboros-3.1
|
7 |
+
- Norquinal/claude_multiround_chat_30k
|
8 |
+
- Doctor-Shotgun/no-robots-sharegpt
|
9 |
+
language:
|
10 |
+
- en
|
11 |
+
tags:
|
12 |
+
- llama
|
13 |
+
- llama 2
|
14 |
+
- smol_llama
|
15 |
+
---
|
16 |
+
# smol_llama-220M-GQA-32k-theta-sft
|
17 |
+
|
18 |
+
Experimental model meant to serve as a long-context speculative decoding model.
|
19 |
+
|
20 |
+
Created using [Doctor-Shotgun/smol_llama-220M-GQA-32k-theta](https://huggingface.co/Doctor-Shotgun/smol_llama-220M-GQA-32k-theta) and finetuning at 32768 context length on several instruction datasets.
|
21 |
+
|
22 |
+
This variant uses the rope theta (rope frequency base) method for context extension.
|
23 |
+
|
24 |
+
The trained instruction format is Alpaca:
|
25 |
+
```
|
26 |
+
### Instruction:
|
27 |
+
{{instruction}}
|
28 |
+
|
29 |
+
### Input:
|
30 |
+
{{user input}}
|
31 |
+
|
32 |
+
### Response:
|
33 |
+
{{model response}}
|
34 |
+
```
|