File size: 4,937 Bytes
cf61691
 
187f1bf
 
 
 
 
 
0b27ca7
cf61691
 
187f1bf
 
9886321
cf61691
85b4afe
 
f5e0058
9886321
 
 
 
187f1bf
 
cf61691
 
 
 
 
 
480a101
cf61691
0b27ca7
 
 
 
 
 
cf61691
0b27ca7
cf61691
 
 
0b27ca7
 
 
cf61691
 
 
 
 
 
 
 
 
0b27ca7
 
cf61691
 
 
 
 
0b27ca7
cf61691
 
 
 
 
0b27ca7
 
 
 
 
 
cf61691
 
 
 
0b27ca7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cf61691
 
 
 
 
 
 
0b27ca7
 
cf61691
 
 
 
 
0b27ca7
 
cf61691
 
 
 
0b27ca7
cf61691
0b27ca7
cf61691
0b27ca7
 
cf61691
 
 
 
0b27ca7
cf61691
 
 
0b27ca7
cf61691
0b27ca7
cf61691
 
 
0b27ca7
cf61691
 
 
0b27ca7
187f1bf
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
---
library_name: transformers
tags:
- medical
language:
- ja
metrics:
- accuracy
license: cc-by-nc-sa-4.0
---

# JMedLLM-7B-v1

⚠️ Do not use it for medical purposes. Only for research purposes. Be aware of the bias, risks, and limitations.

⚠️ Under development.

This model is a Japanese medical LLM based on Qwen2-7B-Instruct. 7B LLMs do not necessarily require NVIDIA A100 GPUs, which is relatively convenient for each clinical institute to operate. 

- This model performs quite well in medical Q&A benchmarks both in Japanese and English. 
- The tokenizer is BPE as well.



## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

This is the model card of a 🤗 transformers model that has been pushed on the Hub.

- **Developed by:** stardust-coder
- **Funded by [optional]:** AIST KAKUSEI(2023)
- **Shared by [optional]:** stardust-coder
- **Language(s) (NLP):** Japanese
- **License:** cc-by-nc-sa-4.0
- **Finetuned from model [optional]:** QWen2-7B-Instruct

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** [stardust-coder/jmedllm-7b-v1](https://huggingface.co/stardust-coder/jmedllm-7b-v1)
- **Paper:** Coming soon...
- **Demo:** None

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

- Ask benchmark medical questions like medical license exams.
- Further research purposes.

### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

Any medical uses.

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

This model carries risks with use. 
Evauation is only conducted with [IgakuQA](https://github.com/jungokasai/IgakuQA) in English and Japanese, and has not covered, nor could it cover all scenarios. 
Its potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. 
This model is not designed for any medical uses. 
Those who download this model should perform safety testing and tuning before any usage.
Users (both direct and downstream) should be aware of the risks, biases and limitations of the model. 

## How to Get Started with the Model

Use the code below to get started with the model.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
import argparse

def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--base_model", type=str)
    parser.add_argument("--peft_model", type=str)
    return parser.parse_args()

def main():
    args = get_args()
    base_model = AutoModelForCausalLM.from_pretrained(
        args.base_model,
        return_dict=True,
        torch_dtype=torch.float16,
        device_map="auto",
    )
    tokenizer = AutoTokenizer.from_pretrained(args.base_model)
    model = PeftModel.from_pretrained(base_model, args.peft_model, device_map="auto")

    prompt = "hoge"
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
    with torch.no_grad():
      generated_tokens = model.generate(
          inputs=input_ids,
          do_sample=False,
      )[0]
    generated_text = tokenizer.decode(generated_tokens)
    print(generated_text)

if __name__ == "__main__" :
    main()
```

## Training Details

### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

1. Naika-Text : collected from a medical journal (not made public)
2. USMLEJP(train split) : translated into Japanese by hand (not made public)

### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

1. Full parameter, 5 epoch
2. LoRA, 5 epoch


#### Training Hyperparameters

- **Training regime:** dtype = AUTO, LoRA target modules = ALL <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

#### Train run time

1. 'train_runtime': 27214.5232, 'epoch': 5, 'global_step': 1890
2. 'train_runtime': 102718.0035, 'epoch': 5, 'global_step': 3145


## Evaluation

Coming soon...

## Technical Specifications [optional]

### Model Architecture

QWen2-7B

### Compute Infrastructure

G.large x 1 in ABCI

#### Software

[MS-SWIFT](https://github.com/modelscope/swift)


# Acknowledgement

This work was supported by AIST KAKUSEI project (FY2023). 

# How to cite
```
Coming soon...
```