Update README.md
Browse files
README.md
CHANGED
@@ -71,5 +71,15 @@ A small 220M param (total) decoder model. This is the first version of the model
|
|
71 |
- GQA (32 heads, 8 key-value), context length 2048
|
72 |
- train-from-scratch on one GPU :)
|
73 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
|
75 |
---
|
|
|
71 |
- GQA (32 heads, 8 key-value), context length 2048
|
72 |
- train-from-scratch on one GPU :)
|
73 |
|
74 |
+
## Links
|
75 |
+
|
76 |
+
Here are some fine-tunes we did, but there are many more possibilities out there!
|
77 |
+
|
78 |
+
- instruct
|
79 |
+
- openhermes - [link](https://huggingface.co/BEE-spoke-data/smol_llama-220M-openhermes)
|
80 |
+
- open-instruct - [link](https://huggingface.co/BEE-spoke-data/smol_llama-220M-open_instruct)
|
81 |
+
- code
|
82 |
+
- python (pypi) - WIP
|
83 |
+
|
84 |
|
85 |
---
|