leaderboard-pr-bot's picture
Adding Evaluation Results
d9b009c verified
|
raw
history blame
6.31 kB
metadata
license: apache-2.0
tags:
  - text-generation
base_model: Locutusque/TinyMistral-248M
datasets:
  - OpenAssistant/oasst_top1_2023-08-25
widget:
  - messages:
      - role: user
        content: >-
          Invited some friends to come home today. Give me some ideas for games
          to play with them!
  - messages:
      - role: user
        content: >-
          How do meteorologists predict how much air pollution will be produced
          in the next year?
  - messages:
      - role: user
        content: Who is Mona Lisa?
  - messages:
      - role: user
        content: Heya!
      - role: assistant
        content: Hi! How may I help you today?
      - role: user
        content: >-
          I need to build a simple website. Where should I start learning about
          web development?
  - messages:
      - role: user
        content: What are some potential applications for quantum computing?
  - messages:
      - role: user
        content: Got a question for you!
      - role: assistant
        content: Sure! What's it?
      - role: user
        content: Why do you love cats so much!? 🐈
  - messages:
      - role: user
        content: Tell me about the pros and cons of social media.
  - messages:
      - role: user
        content: 'Question: What is a dog?'
      - role: assistant
        content: >-
          A dog is a four-legged, domesticated animal that is a member of the
          class Mammalia, which includes all mammals. Dogs are known for their
          loyalty, playfulness, and ability to be trained for various tasks.
          They are also used for hunting, herding, and as service animals.
      - role: user
        content: 'Question: What is the capital of France?'
      - role: assistant
        content: >-
          The capital of France is Paris. Paris is located in the north-central
          region of France and is known for its famous landmarks, such as the
          Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral.
      - role: user
        content: 'Question: What is the color of an apple?'
inference: false
model-index:
  - name: TinyMistral-248M-SFT-v3
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 21.93
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-SFT-v3
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 28.26
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-SFT-v3
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 22.91
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-SFT-v3
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 40.03
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-SFT-v3
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 51.54
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-SFT-v3
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 0
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-SFT-v3
          name: Open LLM Leaderboard

Locutusque's TinyMistral-248M trained on OpenAssistant TOP-1 Conversation Threads

Where to try out this model

The inference widget from HuggingFace was not working properly for this model, so it was temporarily disabled.

To try out this model online, please visit this HuggingFace Space: Felladrin/ModelsPlayground

Recommended Prompt Format

<|im_start|>user
{message}<|im_end|>
<|im_start|>assistant

Recommended Inference Parameters

penalty_alpha: 0.5
top_k: 5

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 27.45
AI2 Reasoning Challenge (25-Shot) 21.93
HellaSwag (10-Shot) 28.26
MMLU (5-Shot) 22.91
TruthfulQA (0-shot) 40.03
Winogrande (5-shot) 51.54
GSM8k (5-shot) 0.00