amornpan
/

openthaigpt-MedChatModelv11

Model card Files Files and versions Community

amornpan commited on Oct 22, 2024

Commit

299e4d9

·

verified ·

1 Parent(s): f1479f6

Update README.md

Files changed (1) hide show

README.md +67 -0

README.md CHANGED Viewed

@@ -113,3 +113,70 @@ model_path = 'amornpan/openthaigpt-MedChatModelv11'
 tokenizer = AutoTokenizer.from_pretrained(model_path)
 tokenizer.pad_token = tokenizer.eos_token
 ```

 tokenizer = AutoTokenizer.from_pretrained(model_path)
 tokenizer.pad_token = tokenizer.eos_token
 ```
+## 3. Prepare Your Input (Custom Prompt)
+Create a custom medical prompt that you want the model to respond to:
+```python
+custom_prompt = "โปรดอธิบายลักษณะช่องปากที่เป็นมะเร็งในระยะเริ่มต้น"
+PROMPT = f'[INST] <You are a question answering assistant. Answer the question as truthfully and helpfully as possible. คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด<>{custom_prompt}[/INST]'
+# Tokenize the input prompt
+inputs = tokenizer(PROMPT, return_tensors="pt", padding=True, truncation=True)
+```
+## 4. Configure the Model for Efficient Loading (4-bit Quantization)
+The model uses 4-bit precision for efficient inference. Here’s how to set up the configuration:
+```python
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.float16
+)
+```
+## 5. Load the Model with Quantization Support
+Now, load the model with the 4-bit quantization settings:
+```python
+model = AutoModelForCausalLM.from_pretrained(
+    model_path,
+    quantization_config=bnb_config,
+    trust_remote_code=True
+)
+```
+## 6. Move the Model and Inputs to the GPU (if available)
+For faster inference, move the model and input tensors to a GPU, if available:
+```python
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model.to(device)
+inputs = {k: v.to(device) for k, v in inputs.items()}
+```
+## 7. Generate a Response from the Model
+Now, generate the medical response by running the model:
+```python
+outputs = model.generate(**inputs, max_new_tokens=200, do_sample=True)
+```
+## 8. Decode the Generated Text
+Finally, decode and print the response from the model:
+```python
+generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(generated_text)
+```
+## More Information
+```amornpan@gmail.com```