michaelfeil
commited on
Commit
•
3c5e6ea
1
Parent(s):
1c075c4
Update readme on quants
Browse files
README.md
CHANGED
@@ -50,9 +50,12 @@ For training data, we generate long contexts by augmenting [SlimPajama](https://
|
|
50 |
| GPU Type | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S |
|
51 |
| Minutes to Train (Wall)| 202 | 555 | 61 | 87 |
|
52 |
|
53 |
-
**Quants**:
|
54 |
-
- [GGUF](https://huggingface.co/crusoeai/Llama-3-8B-Instruct-1048k-GGUF)
|
55 |
- [MLX-4bit](https://huggingface.co/mlx-community/Llama-3-8B-Instruct-1048k-4bit)
|
|
|
|
|
|
|
56 |
|
57 |
## The Gradient AI Team
|
58 |
|
|
|
50 |
| GPU Type | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S |
|
51 |
| Minutes to Train (Wall)| 202 | 555 | 61 | 87 |
|
52 |
|
53 |
+
**Inference / Quants**:
|
54 |
+
- [GGUF by Crusoe](https://huggingface.co/crusoeai/Llama-3-8B-Instruct-1048k-GGUF). Note that you need to add 128009 as [special token with llama.cpp](https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k/discussions/13).
|
55 |
- [MLX-4bit](https://huggingface.co/mlx-community/Llama-3-8B-Instruct-1048k-4bit)
|
56 |
+
- [Ollama](https://ollama.com/library/llama3-gradient)
|
57 |
+
- vLLM docker image, recommended to load via `--max-model-len 65536`
|
58 |
+
|
59 |
|
60 |
## The Gradient AI Team
|
61 |
|