File size: 2,466 Bytes

---
language:
- en
- fr
- nl
- es
- it
- pl
- ro
- de
license: apache-2.0
library_name: transformers
tags:
- mergekit
- merge
- dare
- medical
- biology
- mlx
datasets:
- health_fact
base_model:
- BioMistral/BioMistral-7B
- mistralai/Mistral-7B-Instruct-v0.1
pipeline_tag: text-generation
---
# abhishek-ch/biomistral-7b-synthetic-ehr

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6460910f455531c6be78b2dd/tGtYB0b3eS7A4zbqp1xz0.png)


This model was converted to MLX format from [`BioMistral/BioMistral-7B-DARE`]().
Refer to the [original model card](https://huggingface.co/BioMistral/BioMistral-7B-DARE) for more details on the model.


## Use with mlx

```bash
pip install mlx-lm
```

The model was LoRA fine-tuned on [health_facts](https://huggingface.co/datasets/health_fact) and 
Synthetic EHR dataset inspired by MIMIC-IV using the format below, for 1000 steps (~1M tokens) using mlx.

```python
def format_prompt(prompt:str, question: str) -> str:
    return """<s>[INST]
## Instructions
{}
## User Question
{}.
[/INST]</s> 
""".format(prompt, question)
```

Example For Synthetic EHR Diagnosis System Prompt
```
You are an expert in provide diagnosis summary based on clinical notes inspired by MIMIC-IV-Note dataset.
These notes encompass Chief Complaint along with Patient Summary & medical admission details.
```

Example for Healthfacts Check System Prompt
```
You are a Public Health AI Assistant. You can do the fact-checking of public health claims. \nEach answer labelled with true, false, unproven or mixture. \nPlease provide the reason behind the answer
```

## Loading the model using `mlx`

```python
from mlx_lm import generate, load
model, tokenizer = load("abhishek-ch/biomistral-7b-synthetic-ehr")
response = generate(
    fused_model,
    fused_tokenizer,
    prompt=format_prompt(prompt, question),
    verbose=True,  # Set to True to see the prompt and response
    temp=0.0,
    max_tokens=512,
)
```

## Loading the model using `transformers`

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "abhishek-ch/biomistral-7b-synthetic-ehr"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)
model.to("mps")
input_text = format_prompt(system_prompt, question)
input_ids = tokenizer(input_text, return_tensors="pt").to("mps")
outputs = model.generate(
    **input_ids,
    max_new_tokens=512,
)
print(tokenizer.decode(outputs[0]))
```