Fimbulvetr-11B-v2 / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
4077d6a verified
|
raw
history blame
4.84 kB
metadata
language:
  - en
license: cc-by-nc-4.0
model-index:
  - name: Fimbulvetr-11B-v2
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 51
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/Fimbulvetr-11B-v2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 22.66
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/Fimbulvetr-11B-v2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 0.45
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/Fimbulvetr-11B-v2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 5.59
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/Fimbulvetr-11B-v2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 14.92
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/Fimbulvetr-11B-v2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 25.57
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/Fimbulvetr-11B-v2
          name: Open LLM Leaderboard

Fox1

Cute girl to catch your attention.

https://huggingface.co/Sao10K/Fimbulvetr-11B-v2-GGUF <------ GGUF

Fimbulvetr-v2 - A Solar-Based Model


4/4 Status Update:

got a few reqs on wanting to support me: https://ko-fi.com/sao10k

anyway, status on v3 - Halted for time being, working on dataset work mainly. it's a pain, to be honest. the data I have isn't up to my standard for now. it's good, just not good enough


Prompt Formats - Alpaca or Vicuna. Either one works fine. Recommended SillyTavern Presets - Universal Light

Alpaca:

### Instruction:
<Prompt>
### Input:
<Insert Context Here>
### Response:

Vicuna:

System: <Prompt>

User: <Input>

Assistant:

Changelogs:

25/2 - repo renamed to remove test, model card redone. Model's officially out.
15/2 - Heavy testing complete. Good feedback.


Rant - Kept For Historical Reasons

Ramble to meet minimum length requirements:

Tbh i wonder if this shit is even worth doing. Like im just some broke guy lmao I've spent so much. And for what? I guess creds. Feels good when a model gets good feedback, but it seems like im invisible sometimes. I should be probably advertising myself and my models on other places but I rarely have the time to. Probably just internal jealousy sparking up here and now. Wahtever I guess.

Anyway cool EMT vocation I'm doing is cool except it pays peanuts, damn bruh 1.1k per month lmao. Government to broke to pay for shit. Pays the bills I suppose.

Anyway cool beans, I'm either going to continue the Solar Train or go to Mixtral / Yi when I get paid.

You still here?


Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 20.03
IFEval (0-Shot) 51.00
BBH (3-Shot) 22.66
MATH Lvl 5 (4-Shot) 0.45
GPQA (0-shot) 5.59
MuSR (0-shot) 14.92
MMLU-PRO (5-shot) 25.57