michaelfeil commited on
Commit
3c5e6ea
1 Parent(s): 1c075c4

Update readme on quants

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -50,9 +50,12 @@ For training data, we generate long contexts by augmenting [SlimPajama](https://
50
  | GPU Type | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S |
51
  | Minutes to Train (Wall)| 202 | 555 | 61 | 87 |
52
 
53
- **Quants**:
54
- - [GGUF](https://huggingface.co/crusoeai/Llama-3-8B-Instruct-1048k-GGUF)
55
  - [MLX-4bit](https://huggingface.co/mlx-community/Llama-3-8B-Instruct-1048k-4bit)
 
 
 
56
 
57
  ## The Gradient AI Team
58
 
 
50
  | GPU Type | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S | NVIDIA L40S |
51
  | Minutes to Train (Wall)| 202 | 555 | 61 | 87 |
52
 
53
+ **Inference / Quants**:
54
+ - [GGUF by Crusoe](https://huggingface.co/crusoeai/Llama-3-8B-Instruct-1048k-GGUF). Note that you need to add 128009 as [special token with llama.cpp](https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k/discussions/13).
55
  - [MLX-4bit](https://huggingface.co/mlx-community/Llama-3-8B-Instruct-1048k-4bit)
56
+ - [Ollama](https://ollama.com/library/llama3-gradient)
57
+ - vLLM docker image, recommended to load via `--max-model-len 65536`
58
+
59
 
60
  ## The Gradient AI Team
61