File size: 4,937 Bytes
cf61691 187f1bf 0b27ca7 cf61691 187f1bf 9886321 cf61691 85b4afe f5e0058 9886321 187f1bf cf61691 480a101 cf61691 0b27ca7 cf61691 0b27ca7 cf61691 0b27ca7 cf61691 0b27ca7 cf61691 0b27ca7 cf61691 0b27ca7 cf61691 0b27ca7 cf61691 0b27ca7 cf61691 0b27ca7 cf61691 0b27ca7 cf61691 0b27ca7 cf61691 0b27ca7 cf61691 0b27ca7 cf61691 0b27ca7 cf61691 0b27ca7 cf61691 0b27ca7 cf61691 0b27ca7 187f1bf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
---
library_name: transformers
tags:
- medical
language:
- ja
metrics:
- accuracy
license: cc-by-nc-sa-4.0
---
# JMedLLM-7B-v1
⚠️ Do not use it for medical purposes. Only for research purposes. Be aware of the bias, risks, and limitations.
⚠️ Under development.
This model is a Japanese medical LLM based on Qwen2-7B-Instruct. 7B LLMs do not necessarily require NVIDIA A100 GPUs, which is relatively convenient for each clinical institute to operate.
- This model performs quite well in medical Q&A benchmarks both in Japanese and English.
- The tokenizer is BPE as well.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
This is the model card of a 🤗 transformers model that has been pushed on the Hub.
- **Developed by:** stardust-coder
- **Funded by [optional]:** AIST KAKUSEI(2023)
- **Shared by [optional]:** stardust-coder
- **Language(s) (NLP):** Japanese
- **License:** cc-by-nc-sa-4.0
- **Finetuned from model [optional]:** QWen2-7B-Instruct
### Model Sources
<!-- Provide the basic links for the model. -->
- **Repository:** [stardust-coder/jmedllm-7b-v1](https://huggingface.co/stardust-coder/jmedllm-7b-v1)
- **Paper:** Coming soon...
- **Demo:** None
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
- Ask benchmark medical questions like medical license exams.
- Further research purposes.
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
Any medical uses.
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
This model carries risks with use.
Evauation is only conducted with [IgakuQA](https://github.com/jungokasai/IgakuQA) in English and Japanese, and has not covered, nor could it cover all scenarios.
Its potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts.
This model is not designed for any medical uses.
Those who download this model should perform safety testing and tuning before any usage.
Users (both direct and downstream) should be aware of the risks, biases and limitations of the model.
## How to Get Started with the Model
Use the code below to get started with the model.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
import argparse
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument("--base_model", type=str)
parser.add_argument("--peft_model", type=str)
return parser.parse_args()
def main():
args = get_args()
base_model = AutoModelForCausalLM.from_pretrained(
args.base_model,
return_dict=True,
torch_dtype=torch.float16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(args.base_model)
model = PeftModel.from_pretrained(base_model, args.peft_model, device_map="auto")
prompt = "hoge"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
with torch.no_grad():
generated_tokens = model.generate(
inputs=input_ids,
do_sample=False,
)[0]
generated_text = tokenizer.decode(generated_tokens)
print(generated_text)
if __name__ == "__main__" :
main()
```
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
1. Naika-Text : collected from a medical journal (not made public)
2. USMLEJP(train split) : translated into Japanese by hand (not made public)
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
1. Full parameter, 5 epoch
2. LoRA, 5 epoch
#### Training Hyperparameters
- **Training regime:** dtype = AUTO, LoRA target modules = ALL <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
#### Train run time
1. 'train_runtime': 27214.5232, 'epoch': 5, 'global_step': 1890
2. 'train_runtime': 102718.0035, 'epoch': 5, 'global_step': 3145
## Evaluation
Coming soon...
## Technical Specifications [optional]
### Model Architecture
QWen2-7B
### Compute Infrastructure
G.large x 1 in ABCI
#### Software
[MS-SWIFT](https://github.com/modelscope/swift)
# Acknowledgement
This work was supported by AIST KAKUSEI project (FY2023).
# How to cite
```
Coming soon...
``` |