--- license: apache-2.0 library_name: transformers datasets: - andysalerno/ansalern-nectar-inputoutput base_model: mistralai/Mistral-7B-v0.1 model-index: - name: mistral-sft-v3 results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 61.35 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andysalerno/mistral-sft-v3 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 82.23 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andysalerno/mistral-sft-v3 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 63.4 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andysalerno/mistral-sft-v3 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 48.49 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andysalerno/mistral-sft-v3 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 77.66 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andysalerno/mistral-sft-v3 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 32.45 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andysalerno/mistral-sft-v3 name: Open LLM Leaderboard --- This is [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1), but with the special tokens added for ChatML, and then lightly finetuned with sft using a ChatML formatted dataset: [andysalerno/ansalern-nectar-inputoutput](https://huggingface.co/datasets/andysalerno/ansalern-nectar-inputoutput) The training was very light, so while this model correctly follows ChatML formatting, it is not intended to be a chat model. Rather, it is intended to be a base for further fine-tuning models that will use ChatML. # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_andysalerno__mistral-sft-v3) | Metric |Value| |---------------------------------|----:| |Avg. |60.93| |AI2 Reasoning Challenge (25-Shot)|61.35| |HellaSwag (10-Shot) |82.23| |MMLU (5-Shot) |63.40| |TruthfulQA (0-shot) |48.49| |Winogrande (5-shot) |77.66| |GSM8k (5-shot) |32.45|