Text Generation
Transformers
Safetensors
English
llama
nlp
llm
text-generation-inference
Inference Endpoints
omkarenator commited on
Commit
5425cac
1 Parent(s): 7cb16bf

Add instructions for Ollama

Browse files
Files changed (1) hide show
  1. README.md +32 -0
README.md CHANGED
@@ -101,6 +101,38 @@ python3 -m fastchat.serve.cli --model-path LLM360/AmberChat
101
  | **LLM360/AmberChat** | **5.428125** |
102
  | [Nous-Hermes-13B](https://huggingface.co/NousResearch/Nous-Hermes-13b) | 5.51 |
103
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
 
105
  # Citation
106
 
 
101
  | **LLM360/AmberChat** | **5.428125** |
102
  | [Nous-Hermes-13B](https://huggingface.co/NousResearch/Nous-Hermes-13b) | 5.51 |
103
 
104
+ # Using Quantized Models with Ollama
105
+
106
+ Please follow these steps to use a quantized version of AmberChat on your personal computer or laptop:
107
+
108
+ 1. First, install Ollama by following the instructions provided [here](https://github.com/jmorganca/ollama/tree/main?tab=readme-ov-file#ollama). Next, download a quantized model checkpoint (such as [amberchat.Q8_0.gguf](https://huggingface.co/TheBloke/AmberChat-GGUF/blob/main/amberchat.Q8_0.gguf) for the 8 bit version) from [TheBloke/AmberChat-GGUF](https://huggingface.co/TheBloke/AmberChat-GGUF/tree/main). Create an Ollama Modelfile locally using the template provided below:
109
+ ```
110
+ FROM amberchat.Q8_0.gguf
111
+
112
+ TEMPLATE """{{ .System }}
113
+ USER: {{ .Prompt }}
114
+ ASSISTANT:
115
+ """
116
+ SYSTEM """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
117
+ """
118
+ PARAMETER stop "USER:"
119
+ PARAMETER stop "ASSISTANT:"
120
+ PARAMETER repeat_last_n 0
121
+ PARAMETER num_ctx 2048
122
+ PARAMETER seed 0
123
+ PARAMETER num_predict -1
124
+ ```
125
+ Ensure that the FROM directive points to the downloaded checkpoint file.
126
+
127
+ 2. Now, you can proceed to build the model by running:
128
+ ```bash
129
+ ollama create amberchat -f Modelfile
130
+ ```
131
+ 3. To run the model from the command line, execute the following:
132
+ ```bash
133
+ ollama run amberchat
134
+ ```
135
+ You need to build the model once and can just run it afterwards.
136
 
137
  # Citation
138