--- language: - en license: cc-by-nc-4.0 model-index: - name: MN-12B-Lyra-v3 results: - task: type: text-generation name: Text Generation dataset: name: IFEval (0-Shot) type: HuggingFaceH4/ifeval args: num_few_shot: 0 metrics: - type: inst_level_strict_acc and prompt_level_strict_acc value: 44.86 name: strict accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/MN-12B-Lyra-v3 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BBH (3-Shot) type: BBH args: num_few_shot: 3 metrics: - type: acc_norm value: 25.87 name: normalized accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/MN-12B-Lyra-v3 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MATH Lvl 5 (4-Shot) type: hendrycks/competition_math args: num_few_shot: 4 metrics: - type: exact_match value: 7.18 name: exact match source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/MN-12B-Lyra-v3 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GPQA (0-shot) type: Idavidrein/gpqa args: num_few_shot: 0 metrics: - type: acc_norm value: 3.69 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/MN-12B-Lyra-v3 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MuSR (0-shot) type: TAUR-Lab/MuSR args: num_few_shot: 0 metrics: - type: acc_norm value: 9.04 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/MN-12B-Lyra-v3 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU-PRO (5-shot) type: TIGER-Lab/MMLU-Pro config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 24.99 name: accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/MN-12B-Lyra-v3 name: Open LLM Leaderboard --- ![Lyra](https://huggingface.co/Sao10K/MN-12B-Lyra-v3/resolve/main/Lyra.png) ### Ungated. Thanks for the patience! --- Mistral-NeMo-12B-Lyra-v3, built on top of [Lyra-v2a2](https://huggingface.co/Sao10K/MN-12B-Lyra-v2a2), which itself was built upon [Lyra-v2a1](https://huggingface.co/Sao10K/MN-12B-Lyra-v2a1). # Model Versioning ``` Lyra-v1 [Merge of Custom Roleplay & Instruct Trains, on Different Formats] | | [Additional SFT on 10% of Previous Data, Mixed] v Lyra-v2a1 | | [Low Rank SFT Step + Tokenizer Diddling] v Lyra-v2a2 | | [RL Step Performed on Multiturn Sets, Magpie-style Responses by Lyra-v2a2 for Rejected Data] v Lyra-v3 ``` # This uses a custom ChatML-style prompting Format! \-> **What can go wrong?** ``` [INST]system This is the system prompt.[/INST] [INST]user Instructions placed here.[/INST] [INST]assistant The model's response will be here.[/INST] ``` `Why this? I had used the wrong configs by accident. The format was meant for an 8B pruned NeMo train, instead it went to this. Oops.` # Recommended Samplers: ``` Temperature: 0.7 - 1.2 min_p: 0.1 - 0.2 # Crucial for NeMo ``` # Recommended Stopping Strings: ``` <|im_end|> ``` `Blame messed up Training Configs, oops?` # Training Metrics: \- Trained on 4xH100 SXM for 6 Hours.
\- Trained for 2 Epochs.
\- Effective Global Batch Size: 128.
\- Dataset Used: A custom, cleaned mix of Stheno-v3.4's Dataset, focused mainly on multiturn. --- # Extras Image Source: AI-Generated with FLUX.1 Dev. have a nice day. # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Sao10K__MN-12B-Lyra-v3) | Metric |Value| |-------------------|----:| |Avg. |19.27| |IFEval (0-Shot) |44.86| |BBH (3-Shot) |25.87| |MATH Lvl 5 (4-Shot)| 7.18| |GPQA (0-shot) | 3.69| |MuSR (0-shot) | 9.04| |MMLU-PRO (5-shot) |24.99|