Mathmate-7B-DELLA

Mathmate-7B-DELLA is a merge of the following models using LazyMergekit:

🧩 Configuration

models:
  - model: AI-MO/NuminaMath-7B-TIR
    parameters:
      density: 0.5
      weight: 0.3
  - model: deepseek-ai/DeepSeek-Prover-V1.5-RL
    parameters:
      density: 0.5
      weight: 0.2
merge_method: della
base_model: deepseek-ai/deepseek-math-7b-base
parameters:
  normalize: true
dtype: bfloat16

💻 Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "Haleshot/Mathmate-7B-DELLA"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

📊 Evaluation Results

Evaluation results using LLMAutoeval:

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
Mathmate-7B-DELLA	21.95	36.5	48.08	28.89	33.86

AGIEval

Task	Version	Metric	Value	Stderr
agieval_aqua_rat	0	acc	21.26	2.57
		acc_norm	22.05	2.61
agieval_logiqa_en	0	acc	20.89	1.59
		acc_norm	25.65	1.71
agieval_lsat_ar	0	acc	21.74	2.73
		acc_norm	19.57	2.62
agieval_lsat_lr	0	acc	13.92	1.53
		acc_norm	18.82	1.73
agieval_lsat_rc	0	acc	21.19	2.50
		acc_norm	18.96	2.39
agieval_sat_en	0	acc	24.76	3.01
		acc_norm	21.36	2.86
agieval_sat_en_without_passage	0	acc	27.18	3.11
		acc_norm	23.30	2.95
agieval_sat_math	0	acc	25.45	2.94
		acc_norm	25.91	2.96

Average: 21.95%

GPT4All

Task	Version	Metric	Value	Stderr
arc_challenge	0	acc	22.61	1.22
		acc_norm	25.68	1.28
arc_easy	0	acc	25.25	0.89
		acc_norm	25.08	0.89
boolq	1	acc	52.02	0.87
hellaswag	0	acc	25.77	0.44
		acc_norm	26.09	0.44
openbookqa	0	acc	18.40	1.73
		acc_norm	28.80	2.03
piqa	0	acc	51.31	1.17
		acc_norm	50.11	1.17
winogrande	0	acc	47.75	1.40

Average: 36.5%

TruthfulQA

Task	Version	Metric	Value	Stderr
truthfulqa_mc	1	mc1	22.77	1.47
		mc2	48.08	1.70

Average: 48.08%

Bigbench

Task	Version	Metric	Value	Stderr
bigbench_causal_judgement	0	multiple_choice_grade	49.47	3.64
bigbench_date_understanding	0	multiple_choice_grade	13.55	1.78
bigbench_disambiguation_qa	0	multiple_choice_grade	30.23	2.86
bigbench_geometric_shapes	0	multiple_choice_grade	10.03	1.59
		exact_str_match	0.00	0.00
bigbench_logical_deduction_five_objects	0	multiple_choice_grade	19.40	1.77
bigbench_logical_deduction_seven_objects	0	multiple_choice_grade	14.00	1.31
bigbench_logical_deduction_three_objects	0	multiple_choice_grade	36.67	2.79
bigbench_movie_recommendation	0	multiple_choice_grade	23.60	1.90
bigbench_navigate	0	multiple_choice_grade	47.10	1.58
bigbench_reasoning_about_colored_objects	0	multiple_choice_grade	13.05	0.75
bigbench_ruin_names	0	multiple_choice_grade	53.79	2.36
bigbench_salient_translation_error_detection	0	multiple_choice_grade	15.63	1.15
bigbench_snarks	0	multiple_choice_grade	46.96	3.72
bigbench_sports_understanding	0	multiple_choice_grade	49.70	1.59
bigbench_temporal_sequences	0	multiple_choice_grade	25.80	1.38
bigbench_tracking_shuffled_objects_five_objects	0	multiple_choice_grade	19.76	1.13
bigbench_tracking_shuffled_objects_seven_objects	0	multiple_choice_grade	14.69	0.85
bigbench_tracking_shuffled_objects_three_objects	0	multiple_choice_grade	36.67	2.79

Average: 28.89%

Average score: 33.86%

Elapsed time: 03:52:09

Haleshot
/

Mathmate-7B-DELLA

Mathmate-7B-DELLA

🧩 Configuration

💻 Usage

📊 Evaluation Results

AGIEval

GPT4All

TruthfulQA

Bigbench

Model tree for Haleshot/Mathmate-7B-DELLA

Collection including Haleshot/Mathmate-7B-DELLA

MathMate

Evaluation results