File size: 9,189 Bytes

41d88bd
36f7fdb
 
 
 
 
 
 
41d88bd
36f7fdb
 
3b0ab97
 
 
 
44e0588
 
a37ba34
 
 
4e37f62
 
441e8d1
a37ba34
4e37f62
 
ce067e9
 
 
366b9e1
8c5b89a
f303874
41d88bd
 
5d56ae2
ca0021f
00b6fc2
 
 
c3ad444
ca0021f
41d88bd
f5b132d
c3ad444
41d88bd
ca0021f
 
f5b132d
c3ad444
 
41d88bd
ca0021f
41d88bd
7aa896a
 
 
 
41d88bd
 
 
 
ca0021f
 
 
 
41d88bd
ca0021f
 
41d88bd
 
ca0021f
 
41d88bd
 
 
ca0021f
 
41d88bd
ca0021f
 
 
41d88bd
ca0021f
 
 
41d88bd
f5b132d
 
 
 
 
 
 
 
 
 
 
 
 
41d88bd
 
c3ad444
41d88bd
5353a5c
 
 
c3ad444
 
 
 
4f0b5c1
c3ad444
62fcc53
5d56ae2
62fcc53
 
4f0b5c1
c3ad444
 
 
 
4f0b5c1
f1479f6
 
4f0b5c1
c3ad444
 
 
 
 
 
 
 
 
 
 
4f0b5c1
c3ad444
 
 
 
 
4f0b5c1
299e4d9
c3ad444
 
 
 
4f0b5c1
299e4d9
 
 
c3ad444
299e4d9
4f0b5c1
299e4d9
c3ad444
 
 
 
4f0b5c1
299e4d9
c3ad444
299e4d9
 
 
4f0b5c1
299e4d9
c3ad444
299e4d9
c3ad444
299e4d9
4f0b5c1
c3ad444
 
 
4f0b5c1
299e4d9
c3ad444
299e4d9
c3ad444
3ac9c68
4f0b5c1
82fd3ba
4f0b5c1
ce067e9
c3ad444
ce067e9
c3ad444
ce067e9
4f0b5c1
c3ad444
 
13be1ce
ce067e9
c3ad444
4f0b5c1
5353a5c
 
 
 
 
 
 
 
 
 
 
 
4f0b5c1
3ac9c68
252e033
f5b132d
fd9528c
7ebc2e8

---
base_model:
- openthaigpt/openthaigpt1.5-7b-instruct
datasets:
- Thaweewat/thai-med-pack
language:
- th
- en
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
tags:
- text-generation-inference
- sft
- trl
- 4-bit precision
- bitsandbytes
- LoRA
- Fine-Tuning with LoRA
- LLM
- GenAI
- NT GenAI
- ntgenai
- lahnmah
- NT Thai GPT
- ntthaigpt
- medical
- medtech
- HealthGPT
- หลานม่า
- NT Academy
new_version: Aekanun/openthaigpt-MedChatModelv5.1
---

# 🇹🇭 **Model Card for openthaigpt1.5-7b-medical-tuned**
<!-- Provide a quick summary of what the model is/does. -->

## <font color="blue">ℹ️ This version is optimized for GPU.</font> Please wait for the CPU version, which will be available soon.!!

This model is fine-tuned from openthaigpt1.5-7b-instruct using Supervised Fine-Tuning (SFT) on the Thaweewat/thai-med-pack dataset. The model is designed for medical question-answering tasks in Thai, specializing in providing accurate and contextual answers based on medical information. 


## Model Description
This model was fine-tuned using Supervised Fine-Tuning (SFT) to optimize it for medical question answering in Thai. The base model is `openthaigpt1.5-7b-instruct`, and it has been enhanced with domain-specific knowledge using the Thaweewat/thai-med-pack dataset.

- **Model type:** Causal Language Model (AutoModelForCausalLM)
- **Language(s):** Thai
- **License:** Apache License 2.0
- **Fine-tuned from model:** openthaigpt1.5-7b-instruct
- **Dataset used for fine-tuning:** Thaweewat/thai-med-pack

### Model Sources

- **Repository:** https://huggingface.co/amornpan
- **Citing Repository:** https://huggingface.co/Aekanun
- **Base Model:** https://huggingface.co/openthaigpt/openthaigpt1.5-7b-instruct
- **Dataset:** https://huggingface.co/datasets/Thaweewat/thai-med-pack

## Uses

### Direct Use
The model can be directly used for generating medical responses in Thai. It has been optimized for:
- Medical question-answering
- Providing clinical information
- Health-related dialogue generation

### Downstream Use
This model can be used as a foundational model for medical assistance systems, chatbots, and applications related to healthcare, specifically in the Thai language.

### Out-of-Scope Use
- This model should not be used for real-time diagnosis or emergency medical scenarios.
- Avoid using it for critical clinical decisions without human oversight, as the model is not intended to replace professional medical advice.

## Bias, Risks, and Limitations

### Bias
- The model might reflect biases present in the dataset, particularly when addressing underrepresented medical conditions or topics.

### Risks
- Responses may contain inaccuracies due to the inherent limitations of the model and the dataset used for fine-tuning.
- This model should not be used as the sole source of medical advice.

### Limitations
- Limited to the medical domain.
- The model is sensitive to prompts and may generate off-topic responses for non-medical queries.


## Model Training Results:

![image/png](https://cdn-uploads.huggingface.co/production/uploads/663ce15f197afc063058dc3a/U0TIiWGdNaxl_9TH90gIx.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/663ce15f197afc063058dc3a/mAZBm9Dk7-S-FQ4srj3aG.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/663ce15f197afc063058dc3a/PgRsAWRPGw6T2tsF2aJ3W.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/663ce15f197afc063058dc3a/lmreg4ibgBllTvzfhMeSU.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/663ce15f197afc063058dc3a/cPJ3PWKcqwV2ynNWO1Qrs.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/663ce15f197afc063058dc3a/mkM8VavlG9xHhgNlZ9E1X.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/663ce15f197afc063058dc3a/MecCnAmLlYdpBjwJjMQFu.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/663ce15f197afc063058dc3a/ijHMzw9Zrpm23o89vzsSc.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/663ce15f197afc063058dc3a/hOIyuIA_zT7_s8SG-ZDWQ.png)

## How to Get Started with the Model

Here’s how to load and use the model for generating medical responses in Thai:

## Using Google Colab Pro or Pro+ for fine-tuning and inference.
![image/png](https://cdn-uploads.huggingface.co/production/uploads/663ce15f197afc063058dc3a/21iWqmOHl48-n8lIhUsp-.png)

## 1. Install the Required Packages

First, ensure you have installed the required libraries by running:

```python
pip install torch transformers bitsandbytes

[!pip install bitsandbytes --upgrade]

[!pip install --upgrade transformers huggingface_hub]
```
## 2. Load the Model and Tokenizer

You can load the model and tokenizer directly from Hugging Face using the following code:

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
```
# Define the model path
model_path = 'amornpan/openthaigpt-MedChatModelv11'

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_path)
tokenizer.pad_token = tokenizer.eos_token

## 3. Prepare Your Input (Custom Prompt)

Create a custom medical prompt that you want the model to respond to:

```python
custom_prompt = "โปรดอธิบายลักษณะช่องปากที่เป็นมะเร็งในระยะเริ่มต้น"
PROMPT = f'[INST] <You are a question answering assistant. Answer the question as truthfully and helpfully as possible. คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด<>{custom_prompt}[/INST]'

# Tokenize the input prompt
inputs = tokenizer(PROMPT, return_tensors="pt", padding=True, truncation=True)
```

## 4. Configure the Model for Efficient Loading (4-bit Quantization)

The model uses 4-bit precision for efficient inference. Here’s how to set up the configuration:

```python
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16
)
```

## 5. Load the Model with Quantization Support

Now, load the model with the 4-bit quantization settings:

```python
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    quantization_config=bnb_config,
    trust_remote_code=True
)
```

## 6. Move the Model and Inputs to the GPU (prefer GPU)

For faster inference, move the model and input tensors to a GPU, if available:

```python
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
inputs = {k: v.to(device) for k, v in inputs.items()}
```

## 7. Generate a Response from the Model

Now, generate the medical response by running the model:

```python
outputs = model.generate(**inputs, max_new_tokens=256, do_sample=True)
```

## 8. Decode the Generated Text

Finally, decode and print the response from the model:

```python
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
```

## 9. Output
```python
[INST] <You are a question answering assistant. Answer the question as truthfully and helpfully as possible. 
คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด<>โปรดอธิบายลักษณะช่องปากที่เป็นมะเร็งในระยะเริ่มต้น[/INST] 
<ช่องปากที่เป็นมะเร็งในระยะเริ่มต้นอาจมีลักษณะต่อไปนี้:

- มีเนื้องอกสีขาวหรือสีเทามีขนาดเล็กอยู่บริเวณเยื่อบุช่องปาก
- มีแผลในช่องปากที่ไม่หายภายในสองสัปดาห์
- มีแผลบริเวณจมูกหรือคอที่มีมานานแต่ไม่หาย
- มีเนื้อ hardness หรือการเปลี่ยนแปลงทางโครงสร้างในบริเวณเยื่อบุของช่องปาก
- มีความผิดปกติในรูปร่าง ขนาด และสีของฟัน
- มีการเปลี่ยนแปลงในการบิดงอของลิ้นหรือมัดกล้ามเนื้อที่รับผิดชอบการบิดงอ

สิ่งจำเป็นคือให้พบแพทย์ผู้เชี่ยวชาญโดยเร็วที่สุดหากมีอาการที่
```

### 👤 **Authors**
* Amornpan Phornchaicharoen (amornpan@gmail.com)
* Aekanun Thongtae (cto@bangkokfirsttech.com)
* Montita Somsoo (montita.fonn@gmail.com)
* Jiranuwat Songpad (jiranuwat.song64@gmail.com)
* Phongsatorn Somjai (ipongdev@gmail.com)
* Benchawan Wangphoomyai (benchaa.27@gmail.com)