leaderboard-pr-bot's picture
Adding Evaluation Results
d9b009c verified
|
raw
history blame
6.31 kB
---
license: apache-2.0
tags:
- text-generation
base_model: Locutusque/TinyMistral-248M
datasets:
- OpenAssistant/oasst_top1_2023-08-25
widget:
- messages:
- role: user
content: Invited some friends to come home today. Give me some ideas for games
to play with them!
- messages:
- role: user
content: How do meteorologists predict how much air pollution will be produced
in the next year?
- messages:
- role: user
content: Who is Mona Lisa?
- messages:
- role: user
content: Heya!
- role: assistant
content: Hi! How may I help you today?
- role: user
content: I need to build a simple website. Where should I start learning about
web development?
- messages:
- role: user
content: What are some potential applications for quantum computing?
- messages:
- role: user
content: Got a question for you!
- role: assistant
content: Sure! What's it?
- role: user
content: Why do you love cats so much!? 🐈
- messages:
- role: user
content: Tell me about the pros and cons of social media.
- messages:
- role: user
content: 'Question: What is a dog?'
- role: assistant
content: A dog is a four-legged, domesticated animal that is a member of the class
Mammalia, which includes all mammals. Dogs are known for their loyalty, playfulness,
and ability to be trained for various tasks. They are also used for hunting,
herding, and as service animals.
- role: user
content: 'Question: What is the capital of France?'
- role: assistant
content: The capital of France is Paris. Paris is located in the north-central
region of France and is known for its famous landmarks, such as the Eiffel Tower,
the Louvre Museum, and Notre-Dame Cathedral.
- role: user
content: 'Question: What is the color of an apple?'
inference: false
model-index:
- name: TinyMistral-248M-SFT-v3
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm
value: 21.93
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-SFT-v3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm
value: 28.26
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-SFT-v3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 22.91
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-SFT-v3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2
value: 40.03
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-SFT-v3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc
value: 51.54
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-SFT-v3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 0.0
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-SFT-v3
name: Open LLM Leaderboard
---
# Locutusque's TinyMistral-248M trained on OpenAssistant TOP-1 Conversation Threads
- Base model: [Locutusque/TinyMistral-248M](https://huggingface.co/Locutusque/TinyMistral-248M)
- Dataset: [OpenAssistant/oasst_top1_2023-08-25](https://huggingface.co/datasets/OpenAssistant/oasst_top1_2023-08-25)
- Availability in other ML formats:
- GGUF: [Felladrin/gguf-TinyMistral-248M-SFT-v4](https://huggingface.co/Felladrin/gguf-TinyMistral-248M-SFT-v4)
- ONNX: [Felladrin/onnx-TinyMistral-248M-SFT-v4](https://huggingface.co/Felladrin/onnx-TinyMistral-248M-SFT-v4)
## Where to try out this model
The [inference widget from HuggingFace was not working properly for this model](https://discuss.huggingface.co/t/api-endpoint-not-working-as-expected/69457), so it was temporarily disabled.
To try out this model online, please visit this HuggingFace Space: [Felladrin/ModelsPlayground](https://huggingface.co/spaces/Felladrin/ModelsPlayground)
## Recommended Prompt Format
```
<|im_start|>user
{message}<|im_end|>
<|im_start|>assistant
```
## Recommended Inference Parameters
```yml
penalty_alpha: 0.5
top_k: 5
```
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Felladrin__TinyMistral-248M-SFT-v3)
| Metric |Value|
|---------------------------------|----:|
|Avg. |27.45|
|AI2 Reasoning Challenge (25-Shot)|21.93|
|HellaSwag (10-Shot) |28.26|
|MMLU (5-Shot) |22.91|
|TruthfulQA (0-shot) |40.03|
|Winogrande (5-shot) |51.54|
|GSM8k (5-shot) | 0.00|