Training

This model was trained on two datasets, shown in this model page.

  • Skylion007/openwebtext: 1,000,000 examples at a batch size of 32-4096 (1 epoch)
  • Locutusque/TM-DATA: All examples at a batch size of 12288 (3 epochs) Training took approximately 500 GPU hours on a single Titan V.

Metrics

You can look at the training metrics here: https://wandb.ai/locutusque/TinyMistral-V2/runs/g0rvw6wc

🔥 This model performed excellently on TruthfulQA, outperforming models more than 720x its size. These models include: mistralai/Mixtral-8x7B-v0.1, tiiuae/falcon-180B, berkeley-nest/Starling-LM-7B-alpha, upstage/SOLAR-10.7B-v1.0, and more. 🔥

Downloads last month
1,198
Safetensors
Model size
248M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for Locutusque/TinyMistral-248M-v2

Finetunes
1 model
Merges
2 models
Quantizations
2 models

Datasets used to train Locutusque/TinyMistral-248M-v2