cstr's picture
Update README.md
2ce2af7 verified
|
raw
history blame
7.33 kB
metadata
tags:
  - merge
  - mergekit
  - cstr/Spaetzle-v80-7b
  - cstr/Spaetzle-v79-7b
  - cstr/Spaetzle-v81-7b
  - cstr/Spaetzle-v71-7b
base_model:
  - cstr/Spaetzle-v80-7b
  - cstr/Spaetzle-v79-7b
  - cstr/Spaetzle-v81-7b
  - cstr/Spaetzle-v71-7b
license: cc-by-nc-4.0
language:
  - de
  - en

Spaetzle-v85-7b

Spaetzle-v85-7b is a merge of the following models using LazyMergekit:

Evaluation

EQ-Bench (v2_de): 65.32, Parseable: 171.0

Model AGIEval GPT4All TruthfulQA Bigbench Average
Spaetzle-v85-7b 44.35 75.99 67.23 46.55 58.53

From Intel/low_bit_open_llm_leaderboard:

Metric Value
ARC-c 62.63
ARC-e 85.56
Boolq 87.77
HellaSwag 66.66
Lambada 70.35
MMLU 61.61
Openbookqa 37.2
Piqa 82.48
Truthfulqa 50.43
Winogrande 78.3
Average 68.3

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 23.23 ± 2.65
acc_norm 22.44 ± 2.62
agieval_logiqa_en 0 acc 37.33 ± 1.90
acc_norm 37.94 ± 1.90
agieval_lsat_ar 0 acc 25.22 ± 2.87
acc_norm 23.04 ± 2.78
agieval_lsat_lr 0 acc 49.41 ± 2.22
acc_norm 50.78 ± 2.22
agieval_lsat_rc 0 acc 64.68 ± 2.92
acc_norm 63.20 ± 2.95
agieval_sat_en 0 acc 77.67 ± 2.91
acc_norm 78.16 ± 2.89
agieval_sat_en_without_passage 0 acc 46.12 ± 3.48
acc_norm 45.15 ± 3.48
agieval_sat_math 0 acc 35.45 ± 3.23
acc_norm 34.09 ± 3.20

Average: 44.35%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 63.82 ± 1.40
acc_norm 64.76 ± 1.40
arc_easy 0 acc 85.90 ± 0.71
acc_norm 82.32 ± 0.78
boolq 1 acc 87.61 ± 0.58
hellaswag 0 acc 67.39 ± 0.47
acc_norm 85.36 ± 0.35
openbookqa 0 acc 38.80 ± 2.18
acc_norm 48.80 ± 2.24
piqa 0 acc 83.03 ± 0.88
acc_norm 84.17 ± 0.85
winogrande 0 acc 78.93 ± 1.15

Average: 75.99%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 50.80 ± 1.75
mc2 67.23 ± 1.49

Average: 67.23%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 54.74 ± 3.62
bigbench_date_understanding 0 multiple_choice_grade 68.29 ± 2.43
bigbench_disambiguation_qa 0 multiple_choice_grade 39.53 ± 3.05
bigbench_geometric_shapes 0 multiple_choice_grade 22.28 ± 2.20
exact_str_match 12.26 ± 1.73
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 32.80 ± 2.10
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 23.00 ± 1.59
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 59.00 ± 2.84
bigbench_movie_recommendation 0 multiple_choice_grade 45.60 ± 2.23
bigbench_navigate 0 multiple_choice_grade 51.10 ± 1.58
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 70.10 ± 1.02
bigbench_ruin_names 0 multiple_choice_grade 52.68 ± 2.36
bigbench_salient_translation_error_detection 0 multiple_choice_grade 33.57 ± 1.50
bigbench_snarks 0 multiple_choice_grade 71.27 ± 3.37
bigbench_sports_understanding 0 multiple_choice_grade 74.54 ± 1.39
bigbench_temporal_sequences 0 multiple_choice_grade 40.00 ± 1.55
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 21.52 ± 1.16
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 18.86 ± 0.94
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 59.00 ± 2.84

Average: 46.55%

Average score: 58.53%

🧩 Configuration

models:
  - model: cstr/Spaetzle-v84-7b
    # no parameters necessary for base model
  - model: cstr/Spaetzle-v80-7b
    parameters:
      density: 0.65
      weight: 0.2
  - model: cstr/Spaetzle-v79-7b
    parameters:
      density: 0.65
      weight: 0.2
  - model: cstr/Spaetzle-v81-7b
    parameters:
      density: 0.65
      weight: 0.2
  - model: cstr/Spaetzle-v71-7b
    parameters:
      density: 0.65
      weight: 0.2
merge_method: dare_ties
base_model: cstr/Spaetzle-v84-7b
parameters:
  int8_mask: true
dtype: bfloat16
random_seed: 0
tokenizer_source: base

💻 Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "cstr/Spaetzle-v85-7b"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])