--- library_name: transformers tags: - medical language: - ja metrics: - accuracy license: cc-by-nc-sa-4.0 --- # JMedLLM-7B-v1 ⚠️ Do not use it for medical purposes. Only for research purposes. Be aware of the bias, risks, and limitations. ⚠️ Under development. This model is a Japanese medical LLM based on Qwen2-7B-Instruct. 7B LLMs do not necessarily require NVIDIA A100 GPUs, which is relatively convenient for each clinical institute to operate. - This model performs quite well in medical Q&A benchmarks both in Japanese and English. - The tokenizer is BPE as well. ## Model Details ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. - **Developed by:** stardust-coder - **Funded by [optional]:** AIST KAKUSEI(2023) - **Shared by [optional]:** stardust-coder - **Language(s) (NLP):** Japanese - **License:** cc-by-nc-sa-4.0 - **Finetuned from model [optional]:** QWen2-7B-Instruct ### Model Sources - **Repository:** [stardust-coder/jmedllm-7b-v1](https://huggingface.co/stardust-coder/jmedllm-7b-v1) - **Paper:** Coming soon... - **Demo:** None ## Uses ### Direct Use - Ask benchmark medical questions like medical license exams. - Further research purposes. ### Out-of-Scope Use Any medical uses. ## Bias, Risks, and Limitations This model carries risks with use. Evauation is only conducted with [IgakuQA](https://github.com/jungokasai/IgakuQA) in English and Japanese, and has not covered, nor could it cover all scenarios. Its potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. This model is not designed for any medical uses. Those who download this model should perform safety testing and tuning before any usage. Users (both direct and downstream) should be aware of the risks, biases and limitations of the model. ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch import argparse def get_args(): parser = argparse.ArgumentParser() parser.add_argument("--base_model", type=str) parser.add_argument("--peft_model", type=str) return parser.parse_args() def main(): args = get_args() base_model = AutoModelForCausalLM.from_pretrained( args.base_model, return_dict=True, torch_dtype=torch.float16, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained(args.base_model) model = PeftModel.from_pretrained(base_model, args.peft_model, device_map="auto") prompt = "hoge" input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device) with torch.no_grad(): generated_tokens = model.generate( inputs=input_ids, do_sample=False, )[0] generated_text = tokenizer.decode(generated_tokens) print(generated_text) if __name__ == "__main__" : main() ``` ## Training Details ### Training Data 1. Naika-Text : collected from a medical journal (not made public) 2. USMLEJP(train split) : translated into Japanese by hand (not made public) ### Training Procedure 1. Full parameter, 5 epoch 2. LoRA, 5 epoch #### Training Hyperparameters - **Training regime:** dtype = AUTO, LoRA target modules = ALL #### Train run time 1. 'train_runtime': 27214.5232, 'epoch': 5, 'global_step': 1890 2. 'train_runtime': 102718.0035, 'epoch': 5, 'global_step': 3145 ## Evaluation Coming soon... ## Technical Specifications [optional] ### Model Architecture QWen2-7B ### Compute Infrastructure G.large x 1 in ABCI #### Software [MS-SWIFT](https://github.com/modelscope/swift) # Acknowledgement This work was supported by AIST KAKUSEI project (FY2023). # How to cite ``` Coming soon... ```