anakin87
/

gemma-2b-orpo

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

anakin87 commited on Apr 6

Commit

76e5b9c

•

1 Parent(s): b946408

link to GGUF version

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -23,6 +23,8 @@ language:
 This is an ORPO fine-tune of [google/gemma-2b](https://huggingface.co/google/gemma-2b) with
 [`alvarobartt/dpo-mix-7k-simplified`](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified).
 ## ORPO
 [ORPO (Odds Ratio Preference Optimization)](https://arxiv.org/abs/2403.07691) is a new training paradigm that combines the usually separated phases
 of SFT (Supervised Fine-Tuning) and Preference Alignment (usually performed with RLHF or simpler methods like DPO).

 This is an ORPO fine-tune of [google/gemma-2b](https://huggingface.co/google/gemma-2b) with
 [`alvarobartt/dpo-mix-7k-simplified`](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified).
+**⚡ Quantized version (GGUF)**: https://huggingface.co/anakin87/gemma-2b-orpo-GGUF
 ## ORPO
 [ORPO (Odds Ratio Preference Optimization)](https://arxiv.org/abs/2403.07691) is a new training paradigm that combines the usually separated phases
 of SFT (Supervised Fine-Tuning) and Preference Alignment (usually performed with RLHF or simpler methods like DPO).