Edit model card

mistral-7b-distilabel-truthy-dpo

mistral-7b-distilabel-truthy-dpo is a DPO fine-tuned version of mistralai/Mistral-7B-v0.1 using the mlabonne/distilabel-truthy-dpo-v0.1 dataset.

LoRA

  • r: 16
  • LoRA alpha: 16
  • LoRA dropout: 0.05

Training arguments

  • Batch size: 4
  • Gradient accumulation steps: 4
  • Optimizer: paged_adamw_32bit
  • Max steps: 100
  • Learning rate: 5e-05
  • Learning rate scheduler type: cosine
  • Beta: 0.1
  • Max prompt length: 1024
  • Max length: 1536
Downloads last month
88
Safetensors
Model size
7.24B params
Tensor type
FP16
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for CorticalStack/mistral-7b-distilabel-truthy-dpo

Finetuned
(690)
this model
Quantizations
1 model

Spaces using CorticalStack/mistral-7b-distilabel-truthy-dpo 5