Kaiju-11B / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
05bf686 verified
|
raw
history blame
8.83 kB
metadata
language:
  - en
license: cc-by-nc-4.0
model-index:
  - name: Kaiju-11B
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 69.97
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Himitsui/Kaiju-11B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 87.72
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Himitsui/Kaiju-11B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 66.79
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Himitsui/Kaiju-11B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 62.15
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Himitsui/Kaiju-11B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 83.5
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Himitsui/Kaiju-11B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 66.79
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Himitsui/Kaiju-11B
          name: Open LLM Leaderboard

Included in this repo is the full precision model for Kaiju-11B

(ノ≧∀≦)ノ ‥…━━━━━━━━━━━━━★ ||| ╲/\╭[ ᴼᴼ ౪ ᴼᴼ]╮/\╱\

Hiya! This is an experiment using Gryphe's MergeMonster.

I decided to try and reduce what the community calls 'GPT-isms' or GPT Slop, Solar is a good model but does have fair share of positivity bias and 'slop' in roleplays. I used my friend Sao's models as bases as they are pretty popular, along with Kuromitsu and the popular Instruct-Uncensored tune.

Alpaca Format should be fine as it is universal, Vicuna Format should work too. Universal-Light preset in SillyTavern is pretty nice too. :)

💜 I hope this model may be useful to you 💜


Merge Details Below:

See Merge Config
-----------------------------------------------------------------------------------------------------
| Type | Phrase             | Context                  | Raw Prob*    | Used Prob**  | Change       |
-----------------------------------------------------------------------------------------------------
| BAD  | anticipation       | Her body quivers with    | 9.99850%     | 119.98%      | -54.02%      |
| BAD  | anticipation       | The atmosphere is thic.. | 8.82392%     | 105.89%      | -32.13%      |
| BAD  | unwavering         | Filled with an           | 0.09003%     | 1.08%        | -0.06%       |
| BAD  | determination      | Her eyes were filled w.. | 0.19863%     | 2.38%        | -0.26%       |
| BAD  | determination      | Her stubbornness only .. | 7.17110%     | 86.05%       | -39.86%      |
| BAD  | whisper            | Her voice barely above.. | 96.55492%    | 1158.66%     | -8.91%       |
| BAD  | spine              | shivers down her         | 85.57597%    | 1026.91%     | -66.19%      |
| BAD  | sends shivers      | The thrill of the act    | 0.00230%     | 0.03%        | -0.00%       |
| BAD  | ministrations      | She moans and twitches.. | 1.35264%     | 16.23%       | -10.49%      |
| BAD  | legs               | wraps her                | 2.45741%     | 29.49%       | -10.58%      |
| BAD  | imposing figure    | He had an                | 0.00356%     | 0.04%        | +0.00%       |
| BAD  | shared challenges  | Their bond strengthene.. | 0.10075%     | 1.21%        | -0.03%       |
| BAD  | bond               | forged a                 | 1.78930%     | 21.47%       | -9.07%       |
| BAD  | bond               | an unspoken              | 4.33001%     | 51.96%       | -28.17%      |
| BAD  | enhance our expe.. | I'm excited to see how   | 0.00000%     | 0.00%        | +0.00%       |
| BAD  | sense of vulnera.. | create a                 | 0.00003%     | 0.00%        | -0.00%       |
| BAD  | dimensions of in.. | explore new              | 0.00047%     | 0.01%        | -0.00%       |
| BAD  | deepening our co.. | while                    | 0.00003%     | 0.00%        | -0.00%       |
| BAD  | shared experiences | through                  | 0.00469%     | 0.06%        | -0.00%       |
| BAD  | societal expecta.. | that transcend           | 0.00170%     | 0.02%        | -0.00%       |
| BAD  | conventional bou.. | that defy                | 0.03593%     | 0.43%        | +0.04%       |
| BAD  | conventional bou.. | and defy                 | 0.00410%     | 0.05%        | +0.01%       |
| BAD  | open communication | an environment           | 0.00000%     | 0.00%        | +0.00%       |
| BAD  | emotional vulner.. | an environment           | 0.00000%     | 0.00%        | +0.00%       |
| BAD  | heightens our co.. | touch and the anticipa.. | 0.00000%     | 0.00%        | +0.00%       |
| BAD  | sensations you'r.. | I'm enjoying             | 0.00000%     | 0.00%        | -0.00%       |
| BAD  | is truly arousing  | attention to detail      | 0.00000%     | 0.00%        | +0.00%       |
| BAD  | is truly arousing  | way you explore my body  | 0.00001%     | 0.00%        | +0.00%       |
| BAD  | challenge presen.. | my resolve unwavering .. | 0.00000%     | 0.00%        | +0.00%       |
| BAD  | humble vessel      | surrendering to the ex.. | 0.00000%     | 0.00%        | +0.00%       |
| BAD  | bond               | cherishing the unique    | 1.37498%     | 16.50%       | +1.21%       |
| BAD  | bond               | special                  | 0.05834%     | 0.70%        | +0.01%       |
| BAD  | grows stronger w.. | bond                     | 0.00000%     | 0.00%        | +0.00%       |
| BAD  | that cannot be b.. | bond                     | 0.00000%     | 0.00%        | -0.00%       |
| BAD  | becomes unbreaka.. | bond                     | 0.00000%     | 0.00%        | -0.00%       |
| BAD  | grew stronger wi.. | bond                     | 0.00000%     | 0.00%        | +0.00%       |
| GOOD | The apple is in .. | Question: If I'm in th.. | 78.38934%    | 78.39%       | -10.79%      |
------------------------------------------------------------------------------------------------------
| Totals                                               | 298.32%      | 2717.54%     | -269.30%     |
------------------------------------------------------------------------------------------------------
  • = Unweighted, raw probability - ** = Probability after weight adjustments
-------- MERGE COMPOSITION ---------
Fimbulvetr-11B-v2-Test-14: 0.50
KuroMitsu-11B: 0.18
Fimbulvetr-10.7B-v1: 0.17
SOLAR-10.7B-Instruct-v1.0-uncensored: 0.10
Solstice-11B-v1: 0.05

# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Himitsui__Kaiju-11B)
Metric Value
Avg. 72.82
AI2 Reasoning Challenge (25-Shot) 69.97
HellaSwag (10-Shot) 87.72
MMLU (5-Shot) 66.79
TruthfulQA (0-shot) 62.15
Winogrande (5-shot) 83.50
GSM8k (5-shot) 66.79