--- language: - en license: llama3 model-index: - name: L3-8B-Lunaris-v1 results: - task: type: text-generation name: Text Generation dataset: name: IFEval (0-Shot) type: HuggingFaceH4/ifeval args: num_few_shot: 0 metrics: - type: inst_level_strict_acc and prompt_level_strict_acc value: 68.95 name: strict accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/L3-8B-Lunaris-v1 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BBH (3-Shot) type: BBH args: num_few_shot: 3 metrics: - type: acc_norm value: 32.11 name: normalized accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/L3-8B-Lunaris-v1 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MATH Lvl 5 (4-Shot) type: hendrycks/competition_math args: num_few_shot: 4 metrics: - type: exact_match value: 8.46 name: exact match source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/L3-8B-Lunaris-v1 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GPQA (0-shot) type: Idavidrein/gpqa args: num_few_shot: 0 metrics: - type: acc_norm value: 6.82 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/L3-8B-Lunaris-v1 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MuSR (0-shot) type: TAUR-Lab/MuSR args: num_few_shot: 0 metrics: - type: acc_norm value: 5.55 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/L3-8B-Lunaris-v1 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU-PRO (5-shot) type: TIGER-Lab/MMLU-Pro config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 30.97 name: accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/L3-8B-Lunaris-v1 name: Open LLM Leaderboard --- A generalist / roleplaying model merge based on Llama 3. Models are selected from my personal experience while using them. I personally think this is an improvement over Stheno v3.2, considering the other models helped balance out its creativity and at the same time improving its logic. Settings: ``` Instruct // Context Template: Llama-3-Instruct Temperature: 1.4 min_p: 0.1 ``` --- Merging seems to be a black box magic though? In my personal experience merging multiple models from different datasets / data works better than combining them all in one. *Values chosen are from long-running personal experimentation since Llama-2 Merging Era. I have tweaked them to fit this recipe.* Mergekit Config ``` models: - model: meta-llama/Meta-Llama-3-8B-Instruct - model: crestf411/L3-8B-sunfall-v0.1 # Another RP Model trained on... stuff parameters: density: 0.4 weight: 0.25 - model: Hastagaras/Jamet-8B-L3-MK1 - # Another RP / Storytelling Model parameters: density: 0.5 weight: 0.3 - model: maldv/badger-iota-llama-3-8b #Megamerge - Helps with General Knowledge parameters: density: 0.6 weight: 0.35 - model: Sao10K/Stheno-3.2-Beta # This is Stheno v3.2's Initial Name parameters: density: 0.7 weight: 0.4 merge_method: ties base_model: meta-llama/Meta-Llama-3-8B-Instruct parameters: int8_mask: true rescale: true normalize: false dtype: bfloat16 ``` # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Sao10K__L3-8B-Lunaris-v1) | Metric |Value| |-------------------|----:| |Avg. |25.48| |IFEval (0-Shot) |68.95| |BBH (3-Shot) |32.11| |MATH Lvl 5 (4-Shot)| 8.46| |GPQA (0-shot) | 6.82| |MuSR (0-shot) | 5.55| |MMLU-PRO (5-shot) |30.97|