allganize
/

Llama-3-Alpha-Ko-8B-Instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

kuotient commited on May 24, 2024

Commit

256003e

·

verified ·

1 Parent(s): 82185d8

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ Alpha-Instruct is our latest language model, developed using 'Evolutionary Model
 - [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) (Instruct)
 - [Llama-3-Open-Ko-8B](beomi/Llama-3-Open-Ko-8B) (Continual Pretrained)
-To refine and enhance Alpha-Instruct, we utilized a carefully curated high-quality datasets aimed at 'healing' the model's output, significantly boosting its human preference scores. We use [ORPO] (https://arxiv.org/abs/2403.07691) specifically for this "healing" (RLHF) phase. The datasets* used include:
 - [Korean-Human-Judgements](https://huggingface.co/datasets/HAERAE-HUB/Korean-Human-Judgements)
 - [Orca-Math](https://huggingface.co/datasets/kuotient/orca-math-word-problems-193k-korean)
 - [dpo-mix-7k](https://huggingface.co/datasets/argilla/dpo-mix-7k)

 - [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) (Instruct)
 - [Llama-3-Open-Ko-8B](beomi/Llama-3-Open-Ko-8B) (Continual Pretrained)
+To refine and enhance Alpha-Instruct, we utilized a carefully curated high-quality datasets aimed at 'healing' the model's output, significantly boosting its human preference scores. We use [ORPO](https://arxiv.org/abs/2403.07691) specifically for this "healing" (RLHF) phase. The datasets* used include:
 - [Korean-Human-Judgements](https://huggingface.co/datasets/HAERAE-HUB/Korean-Human-Judgements)
 - [Orca-Math](https://huggingface.co/datasets/kuotient/orca-math-word-problems-193k-korean)
 - [dpo-mix-7k](https://huggingface.co/datasets/argilla/dpo-mix-7k)