--- language: - en license: llama2 tags: - Xwin - Euryale 1.3 - frankenmerge - 90b pipeline_tag: conversational model-index: - name: BigWeave-v6-90b results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 65.36 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=llmixer/BigWeave-v6-90b name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 87.21 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=llmixer/BigWeave-v6-90b name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 68.04 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=llmixer/BigWeave-v6-90b name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 57.96 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=llmixer/BigWeave-v6-90b name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 81.69 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=llmixer/BigWeave-v6-90b name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 44.58 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=llmixer/BigWeave-v6-90b name: Open LLM Leaderboard --- # BigWeave v6 90B A Goliath-120b style frankenmerge of Xwin-LM-70b-v0.1 and Euryale-1.3-70b. The goal is to find other merge combinations that work well. The version number is for me to keep track of the merges, only results that seem to work reasonably well are kept/published. # Prompting Format Vicuna and Alpaca. # Merge process The models used in the merge are [Xwin-LM-70b-v0.1](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1) and [Euryale-1.3-70b](https://huggingface.co/Sao10K/Euryale-1.3-L2-70B). The layer mix: ```yaml - range 0, 12 Xwin - range 9, 14 Euryale - range 12, 62 Xwin - range 54, 71 Euryale - range 62, 80 Xwin ``` # Acknowledgements [@Xwin-LM](https://huggingface.co/Xwin-LM) For creating Xwin [@Sao10K](https://huggingface.co/Sao10K) For creating Euryale [@alpindale](https://huggingface.co/alpindale) For creating the original Goliath [@chargoddard](https://huggingface.co/chargoddard) For developing [mergekit](https://github.com/cg123/mergekit). # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_llmixer__BigWeave-v6-90b) | Metric |Value| |---------------------------------|----:| |Avg. |67.47| |AI2 Reasoning Challenge (25-Shot)|65.36| |HellaSwag (10-Shot) |87.21| |MMLU (5-Shot) |68.04| |TruthfulQA (0-shot) |57.96| |Winogrande (5-shot) |81.69| |GSM8k (5-shot) |44.58|