File size: 2,466 Bytes
c17b3ea 2d06212 c17b3ea 2d06212 c17b3ea 2d06212 c17b3ea 2d06212 c17b3ea 2d06212 aa3749b 2d06212 c17b3ea 2d06212 a8d9eb7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
---
language:
- en
- fr
- nl
- es
- it
- pl
- ro
- de
license: apache-2.0
library_name: transformers
tags:
- mergekit
- merge
- dare
- medical
- biology
- mlx
datasets:
- health_fact
base_model:
- BioMistral/BioMistral-7B
- mistralai/Mistral-7B-Instruct-v0.1
pipeline_tag: text-generation
---
# abhishek-ch/biomistral-7b-synthetic-ehr

This model was converted to MLX format from [`BioMistral/BioMistral-7B-DARE`]().
Refer to the [original model card](https://huggingface.co/BioMistral/BioMistral-7B-DARE) for more details on the model.
## Use with mlx
```bash
pip install mlx-lm
```
The model was LoRA fine-tuned on [health_facts](https://huggingface.co/datasets/health_fact) and
Synthetic EHR dataset inspired by MIMIC-IV using the format below, for 1000 steps (~1M tokens) using mlx.
```python
def format_prompt(prompt:str, question: str) -> str:
return """<s>[INST]
## Instructions
{}
## User Question
{}.
[/INST]</s>
""".format(prompt, question)
```
Example For Synthetic EHR Diagnosis System Prompt
```
You are an expert in provide diagnosis summary based on clinical notes inspired by MIMIC-IV-Note dataset.
These notes encompass Chief Complaint along with Patient Summary & medical admission details.
```
Example for Healthfacts Check System Prompt
```
You are a Public Health AI Assistant. You can do the fact-checking of public health claims. \nEach answer labelled with true, false, unproven or mixture. \nPlease provide the reason behind the answer
```
## Loading the model using `mlx`
```python
from mlx_lm import generate, load
model, tokenizer = load("abhishek-ch/biomistral-7b-synthetic-ehr")
response = generate(
fused_model,
fused_tokenizer,
prompt=format_prompt(prompt, question),
verbose=True, # Set to True to see the prompt and response
temp=0.0,
max_tokens=512,
)
```
## Loading the model using `transformers`
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "abhishek-ch/biomistral-7b-synthetic-ehr"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)
model.to("mps")
input_text = format_prompt(system_prompt, question)
input_ids = tokenizer(input_text, return_tensors="pt").to("mps")
outputs = model.generate(
**input_ids,
max_new_tokens=512,
)
print(tokenizer.decode(outputs[0]))
```
|