Update README.md
Browse files
README.md
CHANGED
@@ -20,7 +20,7 @@ Alpha-Instruct is our latest language model, developed using 'Evolutionary Model
|
|
20 |
- [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) (Instruct)
|
21 |
- [Llama-3-Open-Ko-8B](beomi/Llama-3-Open-Ko-8B) (Continual Pretrained)
|
22 |
|
23 |
-
To refine and enhance Alpha-Instruct, we utilized a carefully curated high-quality datasets aimed at 'healing' the model's output, significantly boosting its human preference scores. We use [ORPO]
|
24 |
- [Korean-Human-Judgements](https://huggingface.co/datasets/HAERAE-HUB/Korean-Human-Judgements)
|
25 |
- [Orca-Math](https://huggingface.co/datasets/kuotient/orca-math-word-problems-193k-korean)
|
26 |
- [dpo-mix-7k](https://huggingface.co/datasets/argilla/dpo-mix-7k)
|
|
|
20 |
- [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) (Instruct)
|
21 |
- [Llama-3-Open-Ko-8B](beomi/Llama-3-Open-Ko-8B) (Continual Pretrained)
|
22 |
|
23 |
+
To refine and enhance Alpha-Instruct, we utilized a carefully curated high-quality datasets aimed at 'healing' the model's output, significantly boosting its human preference scores. We use [ORPO](https://arxiv.org/abs/2403.07691) specifically for this "healing" (RLHF) phase. The datasets* used include:
|
24 |
- [Korean-Human-Judgements](https://huggingface.co/datasets/HAERAE-HUB/Korean-Human-Judgements)
|
25 |
- [Orca-Math](https://huggingface.co/datasets/kuotient/orca-math-word-problems-193k-korean)
|
26 |
- [dpo-mix-7k](https://huggingface.co/datasets/argilla/dpo-mix-7k)
|