amornpan
/

openthaigpt-MedChatModelv11

Model card Files Files and versions Community

amornpan commited on Oct 25, 2024

Commit

c3ad444

verified ·

1 Parent(s): ce067e9

Update README.md

Browse files

Files changed (1) hide show

README.md +66 -94

README.md CHANGED Viewed

@@ -30,22 +30,22 @@ tags:
 new_version: Aekanun/openthaigpt-MedChatModelv5.1
 ---
-# 🇹🇭 Model Card for `openthaigpt1.5-7b-medical-tuned`
 <!-- Provide a quick summary of what the model is/does. -->
-This model is fine-tuned from `openthaigpt1.5-7b-instruct` using Supervised Fine-Tuning (SFT) on the `Thaweewat/thai-med-pack` dataset. The model is designed for medical question-answering tasks in Thai, specializing in providing accurate and contextual answers based on medical information.
 ## 👤 **Developed and Fine-tuned by:**
 - **Amornpan Phornchaicharoen**
 - **Aekanun Thongtae**
 ## Model Description
-This model was fine-tuned using Supervised Fine-Tuning (SFT) to optimize it for medical question answering in Thai. The base model is `openthaigpt1.5-7b-instruct`, and it has been enhanced with domain-specific knowledge using the `Thaweewat/thai-med-pack` dataset.
 - **Model type:** Causal Language Model (AutoModelForCausalLM)
 - **Language(s):** Thai
 - **License:** Apache License 2.0
-- **Fine-tuned from model:** `openthaigpt1.5-7b-instruct`
-- **Dataset used for fine-tuning:** `Thaweewat/thai-med-pack`
 ### Model Sources
@@ -97,120 +97,92 @@ This model can be used as a foundational model for medical assistance systems, c
 ## How to Get Started with the Model
-Here’s how you can start using the model in your project:
-```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
-```
-# Configure the model name
-```python
-model_name = "amornpan/openthaigpt-MedChatModelv11.1"
-```
-# Set up quantization (optional for saving RAM)
-```python
 bnb_config = BitsAndBytesConfig(
     load_in_4bit=True,
     bnb_4bit_quant_type="nf4",
-    bnb_4bit_compute_dtype=torch.float16,
 )
-```
-# Load the model and tokenizer
-```python
-print("Loading model and tokenizer...")
 model = AutoModelForCausalLM.from_pretrained(
-    model_name,
     quantization_config=bnb_config,
-    trust_remote_code=True,
-    device_map="auto"
-)
-tokenizer = AutoTokenizer.from_pretrained(
-    model_name,
     trust_remote_code=True
 )
-tokenizer.pad_token = tokenizer.eos_token
-```
-# Function to generate responses
-```python
-def generate_response(prompt, max_new_tokens=256):
-    PROMPT = f'[INST] <You are a question answering assistant. Answer the question as truthful and helpful as possible คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด<>{prompt}[/INST]'
-    inputs = tokenizer(PROMPT, return_tensors="pt").to(model.device)
-    generation_config = {
-        "temperature": 0.6,
-        "top_p": 0.95,
-        "repetition_penalty": 1.15,
-        "max_new_tokens": max_new_tokens,
-        "pad_token_id": tokenizer.eos_token_id
-    }
-    with torch.no_grad():
-        generation_output = model.generate(
-            **inputs,
-            **generation_config
-        )
-    response = tokenizer.decode(generation_output[0], skip_special_tokens=True)
-    response = response.split("[/INST]")[-1].strip()
-    return response
-```
-# Example usage
-```python
-if __name__ == "__main__":
-    questions = [
-        "โปรดอธิบายลักษณะช่องปากที่เป็นมะเร็งในระยะเริ่มต้น",
-        "อาการของโรคไข้เลือดออกมีอะไรบ้าง"
-    ]
-    print("Testing question answering:")
-    for question in questions:
-        print("\nQuestion: ", question)
-        answer = generate_response(question)
-        print("Answer: ", answer)
-        print("-" * 50)
-```
-## 9. Output
-```python
-คำถาม:  โปรดอธิบายลักษณะช่องปากที่เป็นมะเร็งในระยะเริ่มต้น
-/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:590: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
-  warnings.warn(
-/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:595: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.95` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
-  warnings.warn(
-คำตอบ:  [ANS] ในระยะเริ่มต้นของมะเร็งช่องปาก อาจไม่มีอาการแสดงที่ชัดเจนหรือมีเพียงเล็กน้อย เช่น การเปลี่ยนแปลงทางกายภาพเล็กๆ บนเนื้อเยื่อภายในช่องปาก ซึ่งอาจรวมถึง:
-1. สีหรือรูปร่างที่เปลี่ยนแปลง: มีการเปลี่ยนแปลงทางกายภาพเช่น เปลี่ยนสี, ขนาด, รูปทรง หรือความหนาแน่นของเนื้อเยื่อ
-2. แผลที่หายยาก: แผลในช่องปากที่เกิดขึ้นแล้วหายยาก หรือไม่หายเลย
-3. การเปลี่ยนแปลงในการเคี้ยวหรือกลืนอาหาร: หากพบว่ามีการเปลี่ยนแปลงในการเคี้ยวหรือกลืนอาหาร ควรไปตรวจโดยแพทย์
-4. การเปลี่ยนแปลงทางเสียง: หากเสียงเปลี่ยนแปลง โดยเฉพาะ
---------------------------------------------------
-คำถาม:  อาการของโรคไข้เลือดออกมีอะไรบ้าง
-คำตอบ:  [ANS] อาการของโรคไข้เลือดออกอาจแตกต่างกันไปในแต่ละคน แต่โดยทั่วไปแล้ว มักจะมีอาการดังนี้:
-1. ไข้: เป็นอาการที่พบได้มากที่สุด และอาจมีความร้อนสูงเป็นเวลานานหลายวัน
-2. การอ่อนเพลียและเหนื่อยล้า
-3. ปวดศีรษะหรือปวดตามข้อ
-4. สิ่งที่คล้ายกับการแพ้ (เช่น ผื่นหรือลมพิษ)
-5. เยื่อบุใต้ผิวหนังอักเสบ (Hemorrhagic rash) : ซึ่งเริ่มจากจุดแดงๆ บนใบหน้��� ลำคอ และแขน แล้วขยายไปทั่วร่างกาย โดยเฉพาะบริเวณขาและแขน
-6. แผลเลือดฝาด (Petechiae): ซึ่งเป็นจุดเล็กๆ ที่เห็นได้ชัดเจนบน
-```
 ### Authors
 * Amornpan Phornchaicharoen (amornpan@gmail.com)

 new_version: Aekanun/openthaigpt-MedChatModelv5.1
 ---
+# 🇹🇭 Model Card for openthaigpt1.5-7b-medical-tuned
 <!-- Provide a quick summary of what the model is/does. -->
+This model is fine-tuned from openthaigpt1.5-7b-instruct using Supervised Fine-Tuning (SFT) on the Thaweewat/thai-med-pack dataset. The model is designed for medical question-answering tasks in Thai, specializing in providing accurate and contextual answers based on medical information.
 ## 👤 **Developed and Fine-tuned by:**
 - **Amornpan Phornchaicharoen**
 - **Aekanun Thongtae**
 ## Model Description
+This model was fine-tuned using Supervised Fine-Tuning (SFT) to optimize it for medical question answering in Thai. The base model is `openthaigpt1.5-7b-instruct`, and it has been enhanced with domain-specific knowledge using the Thaweewat/thai-med-pack dataset.
 - **Model type:** Causal Language Model (AutoModelForCausalLM)
 - **Language(s):** Thai
 - **License:** Apache License 2.0
+- **Fine-tuned from model:** openthaigpt1.5-7b-instruct
+- **Dataset used for fine-tuning:** Thaweewat/thai-med-pack
 ### Model Sources
 ## How to Get Started with the Model
+Here’s how to load and use the model for generating medical responses in Thai:
+## 1. Install the Required Packages
+First, ensure you have installed the required libraries by running:
+python
+pip install torch transformers bitsandbytes
+## 2. Load the Model and Tokenizer
+You can load the model and tokenizer directly from Hugging Face using the following code:
+python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+# Define the model path
+model_path = 'amornpan/openthaigpt-MedChatModelv11'
+# Load the tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained(model_path)
+tokenizer.pad_token = tokenizer.eos_token
+## 3. Prepare Your Input (Custom Prompt)
+Create a custom medical prompt that you want the model to respond to:
+python
+custom_prompt = "โปรดอธิบายลักษณะช่องปากที่เป็นมะเร็งในระยะเริ่มต้น"
+PROMPT = f'[INST] <You are a question answering assistant. Answer the question as truthfully and helpfully as possible. คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด<>{custom_prompt}[/INST]'
+# Tokenize the input prompt
+inputs = tokenizer(PROMPT, return_tensors="pt", padding=True, truncation=True)
+## 4. Configure the Model for Efficient Loading (4-bit Quantization)
+The model uses 4-bit precision for efficient inference. Here’s how to set up the configuration:
+python
 bnb_config = BitsAndBytesConfig(
     load_in_4bit=True,
     bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.float16
 )
+## 5. Load the Model with Quantization Support
+Now, load the model with the 4-bit quantization settings:
+python
 model = AutoModelForCausalLM.from_pretrained(
+    model_path,
     quantization_config=bnb_config,
     trust_remote_code=True
 )
+## 6. Move the Model and Inputs to the GPU (prefer GPU)
+For faster inference, move the model and input tensors to a GPU, if available:
+python
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model.to(device)
+inputs = {k: v.to(device) for k, v in inputs.items()}
+## 7. Generate a Response from the Model
+Now, generate the medical response by running the model:
+python
+outputs = model.generate(**inputs, max_new_tokens=200, do_sample=True)
+## 8. Decode the Generated Text
+Finally, decode and print the response from the model:
+python
+generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(generated_text)
+## 9. Output
+python
+[INST] <You are a question answering assistant. Answer the question as truthfully and helpfully as possible. คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด<>โปรดอธิบายลักษณะช่องปากที่เป็นมะเร็งในระยะเริ่มต้น[/INST] มะเร็งช่องปากเป็นมะเร็งเพียงชนิดเดียวที่ได้รับผลกระทบจากนิโคติน มันคือผู้ชายกลุ่มอายุ 60 – 75 คน คุณจะแสดงอาการและเกิดขึ้นอย่างรวดเร็วหากเกิดมะเร็งช่องปาก คุณจะสังเกตเห็นปื้นแพร่กระจายของเนื้องอก ส่วนใหญ่ในช่องปาก เนื้องอกแสดงว่าเป็นเจ้าแห่ที่กำลังทำลายเยียวยา ค้นหาทั้งภายในและภายนอกลิ้นที่อยู่ติดกางเกงป่อง มะเร็งกระเพาะปัสสาวะหรือมะเร็งกล้ามเนื้อกระเพาะ
 ### Authors
 * Amornpan Phornchaicharoen (amornpan@gmail.com)