Medmerge-tulu-70b / README.md
satyamt's picture
Update README.md
e009ad6 verified
metadata
license: apache-2.0
tags:
  - merge
  - mergekit
  - epfl-llm/meditron-70b
  - allenai/tulu-2-dpo-70b

Medmerge-tulu-70b

Medmerge-tulu-70b is a merge of the following models:

Open LLM Leaderboard

image/png

Model Name ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K
tulu-2-dpo-70b 72.1 88.99 69.84 65.78 83.27 62.62
Medmerge-tulu-70b 67.81 87.46 70.1 47.89 83.43 56.56

Performance

Clinical Camel demonstrates competitive performance on medical benchmarks.

Table: Five-Shot Performance of Clinical Camel-70B (C70), GPT3.5, GPT4, and Med-PaLM 2 on Various Medical Datasets

Dataset Medmerge-tulu-70b ClinicalCamel-70B GPT3.5 GPT4 Med-PaLM 2
MMLU Anatomy 66.6 65.2 60.7 80.0 77.8
MMLU Clinical Knowledge 72.0 72.8 68.7 86.4 88.3
MMLU College Biology 84.7 81.2 72.9 93.8 94.4
MMLU College Medicine 64.2 68.2 63.6 76.3 80.9
MMLU Medical Genetics 76.0 69.0 68.0 92.0 90.0
MMLU Professional Medicine 75.7 75.0 69.8 93.8 95.2
MedMCQA 54.2 51.0 72.4 71.3
MedQA (USMLE) 60.7 53.6 81.4 79.7
PubMedQA 77.9 60.2 74.4 79.2
USMLE Sample Exam 64.3 58.5 86.6 -

🧩 Configuration

models:
  - model: NousResearch/Llama-2-70b-hf
    # no parameters necessary for base model
  - model: wanglab/ClinicalCamel-70B
    parameters:
      weight: 0.08
      density: 0.45
  - model: epfl-llm/meditron-70b
    parameters:
      weight: 0.08
      density: 0.45
  - model: allenai/tulu-2-dpo-70b
    parameters:
      weight: 0.08
      density: 0.45
merge_method: dare_ties
base_model: NousResearch/Llama-2-70b-hf
parameters:
  int8_mask: true
dtype: bfloat16

💻 Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "Technoculture/Medmerge-tulu-70b"
messages = [{"role": "user", "content": "I am feeling sleepy these days"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])