amornpan commited on
Commit
299e4d9
·
verified ·
1 Parent(s): f1479f6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md CHANGED
@@ -113,3 +113,70 @@ model_path = 'amornpan/openthaigpt-MedChatModelv11'
113
  tokenizer = AutoTokenizer.from_pretrained(model_path)
114
  tokenizer.pad_token = tokenizer.eos_token
115
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
113
  tokenizer = AutoTokenizer.from_pretrained(model_path)
114
  tokenizer.pad_token = tokenizer.eos_token
115
  ```
116
+
117
+ ## 3. Prepare Your Input (Custom Prompt)
118
+
119
+ Create a custom medical prompt that you want the model to respond to:
120
+
121
+ ```python
122
+ custom_prompt = "โปรดอธิบายลักษณะช่องปากที่เป็นมะเร็งในระยะเริ่มต้น"
123
+ PROMPT = f'[INST] <You are a question answering assistant. Answer the question as truthfully and helpfully as possible. คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด<>{custom_prompt}[/INST]'
124
+
125
+ # Tokenize the input prompt
126
+ inputs = tokenizer(PROMPT, return_tensors="pt", padding=True, truncation=True)
127
+
128
+ ```
129
+
130
+ ## 4. Configure the Model for Efficient Loading (4-bit Quantization)
131
+
132
+ The model uses 4-bit precision for efficient inference. Here’s how to set up the configuration:
133
+
134
+ ```python
135
+ bnb_config = BitsAndBytesConfig(
136
+ load_in_4bit=True,
137
+ bnb_4bit_quant_type="nf4",
138
+ bnb_4bit_compute_dtype=torch.float16
139
+ )
140
+ ```
141
+
142
+ ## 5. Load the Model with Quantization Support
143
+
144
+ Now, load the model with the 4-bit quantization settings:
145
+
146
+ ```python
147
+ model = AutoModelForCausalLM.from_pretrained(
148
+ model_path,
149
+ quantization_config=bnb_config,
150
+ trust_remote_code=True
151
+ )
152
+ ```
153
+
154
+ ## 6. Move the Model and Inputs to the GPU (if available)
155
+
156
+ For faster inference, move the model and input tensors to a GPU, if available:
157
+
158
+ ```python
159
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
160
+ model.to(device)
161
+ inputs = {k: v.to(device) for k, v in inputs.items()}
162
+ ```
163
+
164
+ ## 7. Generate a Response from the Model
165
+
166
+ Now, generate the medical response by running the model:
167
+
168
+ ```python
169
+ outputs = model.generate(**inputs, max_new_tokens=200, do_sample=True)
170
+ ```
171
+
172
+ ## 8. Decode the Generated Text
173
+
174
+ Finally, decode and print the response from the model:
175
+
176
+ ```python
177
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
178
+ print(generated_text)
179
+ ```
180
+
181
+ ## More Information
182
+ ```amornpan@gmail.com```