amornpan
/

openthaigpt-MedChatModelv11

Model card Files Files and versions Community

amornpan commited on Oct 25, 2024

Commit

ce067e9

verified ·

1 Parent(s): d20adfb

Update README.md

Browse files

Files changed (1) hide show

README.md +82 -60

README.md CHANGED Viewed

@@ -24,6 +24,9 @@ tags:
 - lahnmah
 - NT Thai GPT
 - ntthaigpt
 new_version: Aekanun/openthaigpt-MedChatModelv5.1
 ---
@@ -94,100 +97,119 @@ This model can be used as a foundational model for medical assistance systems, c
 ## How to Get Started with the Model
-Here’s how to load and use the model for generating medical responses in Thai:
-## 1. Install the Required Packages
-First, ensure you have installed the required libraries by running:
-```python
-pip install torch transformers bitsandbytes
-```
-## 2. Load the Model and Tokenizer
-You can load the model and tokenizer directly from Hugging Face using the following code:
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
-# Define the model path
-model_path = 'amornpan/openthaigpt-MedChatModelv11'
-# Load the tokenizer and model
-tokenizer = AutoTokenizer.from_pretrained(model_path)
-tokenizer.pad_token = tokenizer.eos_token
 ```
-## 3. Prepare Your Input (Custom Prompt)
-Create a custom medical prompt that you want the model to respond to:
 ```python
-custom_prompt = "โปรดอธิบายลักษณะช่องปากที่เป็นมะเร็งในระยะเริ่มต้น"
-PROMPT = f'[INST] <You are a question answering assistant. Answer the question as truthfully and helpfully as possible. คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด<>{custom_prompt}[/INST]'
-# Tokenize the input prompt
-inputs = tokenizer(PROMPT, return_tensors="pt", padding=True, truncation=True)
 ```
-## 4. Configure the Model for Efficient Loading (4-bit Quantization)
-The model uses 4-bit precision for efficient inference. Here’s how to set up the configuration:
 ```python
 bnb_config = BitsAndBytesConfig(
     load_in_4bit=True,
     bnb_4bit_quant_type="nf4",
-    bnb_4bit_compute_dtype=torch.float16
 )
 ```
-## 5. Load the Model with Quantization Support
-Now, load the model with the 4-bit quantization settings:
 ```python
 model = AutoModelForCausalLM.from_pretrained(
-    model_path,
     quantization_config=bnb_config,
     trust_remote_code=True
 )
 ```
-## 6. Move the Model and Inputs to the GPU (prefer GPU)
-For faster inference, move the model and input tensors to a GPU, if available:
 ```python
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-model.to(device)
-inputs = {k: v.to(device) for k, v in inputs.items()}
 ```
-## 7. Generate a Response from the Model
-Now, generate the medical response by running the model:
 ```python
-outputs = model.generate(**inputs, max_new_tokens=200, do_sample=True)
-```
-## 8. Decode the Generated Text
-Finally, decode and print the response from the model:
-```python
-generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print(generated_text)
-```
-## 9. Output
-```python
-[INST] <You are a question answering assistant. Answer the question as truthfully and helpfully as possible. คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด<>โปรดอธิบายลักษณะช่องปากที่เป็นมะเร็งในระยะเริ่มต้น[/INST] มะเร็งช่องปากเป็นมะเร็งเพียงชนิดเดียวที่ได้รับผลกระทบจากนิโคติน มันคือผู้ชายกลุ่มอายุ 60 – 75 คน คุณจะแสดงอาการและเกิดขึ้นอย่างรวดเร็วหากเกิดมะเร็งช่องปาก คุณจะสังเกตเห็นปื้นแพร่กระจายของเนื้องอก ส่วนใหญ่ในช่องปาก เนื้องอกแสดงว่าเป็นเจ้าแห่ที่กำลังทำลายเยียวยา ค้นหาทั้งภายในและภายนอกลิ้นที่อยู่ติดกางเกงป่อง มะเร็งกระเพาะปัสสาวะหรือมะเร็งกล้ามเนื้อกระเพาะ
 ```
 ### Authors

 - lahnmah
 - NT Thai GPT
 - ntthaigpt
+- medical
+- medtech
+- HealthGPT
 new_version: Aekanun/openthaigpt-MedChatModelv5.1
 ---
 ## How to Get Started with the Model
+Here’s how you can start using the model in your project:
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
 ```
+# Configure the model name
 ```python
+model_name = "amornpan/openthaigpt-MedChatModelv11.1"
 ```
+# Set up quantization (optional for saving RAM)
 ```python
 bnb_config = BitsAndBytesConfig(
     load_in_4bit=True,
     bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.float16,
 )
 ```
+# Load the model and tokenizer
 ```python
+print("Loading model and tokenizer...")
 model = AutoModelForCausalLM.from_pretrained(
+    model_name,
     quantization_config=bnb_config,
+    trust_remote_code=True,
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(
+    model_name,
     trust_remote_code=True
 )
+tokenizer.pad_token = tokenizer.eos_token
 ```
+# Function to generate responses
 ```python
+def generate_response(prompt, max_new_tokens=256):
+    PROMPT = f'[INST] <You are a question answering assistant. Answer the question as truthful and helpful as possible คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด<>{prompt}[/INST]'
+    inputs = tokenizer(PROMPT, return_tensors="pt").to(model.device)
+    generation_config = {
+        "temperature": 0.6,
+        "top_p": 0.95,
+        "repetition_penalty": 1.15,
+        "max_new_tokens": max_new_tokens,
+        "pad_token_id": tokenizer.eos_token_id
+    }
+    with torch.no_grad():
+        generation_output = model.generate(
+            **inputs,
+            **generation_config
+        )
+    response = tokenizer.decode(generation_output[0], skip_special_tokens=True)
+    response = response.split("[/INST]")[-1].strip()
+    return response
 ```
+# Example usage
+```python
+if __name__ == "__main__":
+    questions = [
+        "โปรดอธิบายลักษณะช่องปากที่เป็นมะเร็งในระยะเริ่มต้น",
+        "อาการของโรคไข้เลือดออกมีอะไรบ้าง"
+    ]
+    print("Testing question answering:")
+    for question in questions:
+        print("\nQuestion: ", question)
+        answer = generate_response(question)
+        print("Answer: ", answer)
+        print("-" * 50)
+```
+## 9. Output
 ```python
+คำถาม:  โปรดอธิบายลักษณะช่องปากที่เป็นมะเร็งในระยะเริ่มต้น
+/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:590: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
+  warnings.warn(
+/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:595: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.95` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
+  warnings.warn(
+คำตอบ:  [ANS] ในระยะเริ่มต้นของมะเร็งช่องปาก อาจไม่มีอาการแสดงที่ชัดเจนหรือมีเพียงเล็กน้อย เช่น การเปลี่ยนแปลงทางกายภาพเล็กๆ บนเนื้อเยื่อภายในช่องปาก ซึ่งอาจรวมถึง:
+1. สีหรือรูปร่างที่เปลี่ยนแปลง: มีการเปลี่ยนแปลงทางกายภาพเช่น เปลี่ยนสี, ขนาด, รูปทรง หรือความหนาแน่นของเนื้อเยื่อ
+2. แผลที่หายยาก: แผลในช่องปากที่เกิดขึ้นแล้วหายยาก หรือไม่หายเลย
+3. การเปลี่ยนแปลงในการเคี้ยวหรือกลืนอาหาร: หากพบว่ามีการเปลี่ยนแปลงในการเคี้ยวหรือกลืนอาหาร ควรไปตรวจโดยแพทย์
+4. การเปลี่ยนแปลงทางเสียง: หากเสียงเปลี่ยนแปลง โดยเฉพาะ
+--------------------------------------------------
+คำถาม:  อาการของโรคไข้เลือดออกมีอะไรบ้าง
+คำตอบ:  [ANS] อาการของโรคไข้เลือดออกอาจแตกต่างกันไปในแต่ละคน แต่โดยทั่วไปแล้ว มักจะมีอาการดังนี้:
+1. ไข้: เป็นอาการที่พบได้มากที่สุด และอาจมีความร้อนสูงเป็นเวลานานหลายวัน
+2. การอ่อนเพลียและเหนื่อยล้า
+3. ปวดศีรษะหรือปวดตามข้อ
+4. สิ่งที่คล้ายกับการแพ้ (เช่น ผื่นหรือลมพิษ)
+5. เยื่อบุใต้ผิวหนังอักเสบ (Hemorrhagic rash) : ซึ่งเริ่มจากจุดแดงๆ บนใบหน้า ลำคอ และแขน แล้วขยายไปทั่วร่างกาย โดยเฉพาะบริเวณขาและแขน
+6. แผลเลือดฝาด (Petechiae): ซึ่งเป��นจุดเล็กๆ ที่เห็นได้ชัดเจนบน
 ```
 ### Authors