--- license: llama2 tags: - mergekit - merge model-index: - name: WinterGoddess-1.4x-70b-32k results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 71.16 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/WinterGoddess-1.4x-70b-32k name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 89.12 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/WinterGoddess-1.4x-70b-32k name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 66.42 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/WinterGoddess-1.4x-70b-32k name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 63.87 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/WinterGoddess-1.4x-70b-32k name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 82.56 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/WinterGoddess-1.4x-70b-32k name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 43.29 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/WinterGoddess-1.4x-70b-32k name: Open LLM Leaderboard --- This is a 32k version of Sao10K/WinterGoddess-1.4x-70B-L2, extended using method discussed [here](https://huggingface.co/grimulkan/aurelian-v0.5-70b-rope8-32K-fp16/discussions/2). # Quants Thanks for GGUF, [@Nexesenex](https://huggingface.co/Nexesenex)! - [GGUF](https://huggingface.co/Nexesenex/ChuckMcSneed_WinterGoddess-1.4x-70b-32k-iMat.GGUF) # Benchmarks ### NeoEvalPlusN_benchmark [My meme benchmark.](https://huggingface.co/datasets/ChuckMcSneed/NeoEvalPlusN_benchmark) | Test name | WinterGoddess | WinterGoddess-32k | | ---------- | ---------- | ------- | | B | 2 | 2.5 | | C | 1.5 | 2 | | D | 3 | 0 | | S | 2.75 | 1.5 | | P | 5.5 | 2.25 | | Total | 14.75 | 8.25 | ### [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) [Leaderboard on Huggingface](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |Model |Average|ARC |HellaSwag|MMLU |TruthfulQA|Winogrande|GSM8K| |---------------------------------------|-------|-----|---------|-----|----------|----------|-----| |Sao10K/WinterGoddess-1.4x-70B-L2 |73.23 |72.78|90.11 |71.12|65.76 |85 |54.59| |ChuckMcSneed/WinterGoddess-1.4x-70b-32k|69.4 |71.16|89.12 |66.42|63.87 |82.56 |43.29| |Difference |3.83 |1.62 |0.99 |4.7 |1.89 |2.44 |11.3 | Here the losses seem far less brutal than on my bench. It seems that extending with LongLORA kills MMLU and GSM8K performance. Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ChuckMcSneed__WinterGoddess-1.4x-70b-32k) | Metric |Value| |---------------------------------|----:| |Avg. |69.40| |AI2 Reasoning Challenge (25-Shot)|71.16| |HellaSwag (10-Shot) |89.12| |MMLU (5-Shot) |66.42| |TruthfulQA (0-shot) |63.87| |Winogrande (5-shot) |82.56| |GSM8k (5-shot) |43.29|