Finetune of CultriX/MistralTrix-v1 on Symbolic Logic content from Lewis Carrol (at a very low learning rate because of the very small dataset - I'm just experimenting and have no idea if this was effective at changing the model output).

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	73.33
AI2 Reasoning Challenge (25-Shot)	72.53
HellaSwag (10-Shot)	88.34
MMLU (5-Shot)	65.26
TruthfulQA (0-shot)	70.93
Winogrande (5-shot)	80.66
GSM8k (5-shot)	62.24

Downloads last month: 1,107

Safetensors

Model size

8.99B params

Tensor type

FP16

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for ryandt/MusingCaterpillar

Quantizations

1 model

Dataset used to train ryandt/MusingCaterpillar

Spaces using ryandt/MusingCaterpillar 13

Evaluation results

normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set Open LLM Leaderboard

72.530
normalized accuracy on HellaSwag (10-Shot)
validation set Open LLM Leaderboard

88.340
accuracy on MMLU (5-Shot)
test set Open LLM Leaderboard

65.260
mc2 on TruthfulQA (0-shot)
validation set Open LLM Leaderboard

70.930
accuracy on Winogrande (5-shot)
validation set Open LLM Leaderboard

80.660
accuracy on GSM8k (5-shot)
test set Open LLM Leaderboard

62.240

View on Papers With Code