S4nto/lora-dpo-finetuned-model-beta-0.1-rate-1e5-stage2-iter40000-sft Text Generation • Updated May 16 • 11
S4nto/lora-dpo-finetuned-model-beta-0.5-rate-2e6-stage2-iter40000-sft Text Generation • Updated May 15 • 9
S4nto/lora-dpo-finetuned-model-beta-0.1-rate-1e6-stage2-iter40000-sft Text Generation • Updated May 15 • 7
S4nto/lora-dpo-finetuned-model-beta-0.5-rate-1e6-stage2-iter40000-sft Text Generation • Updated May 15 • 9