denial07
/

Qwen2-72B-Instruct-kor-dpo

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

denial07 commited on Jul 31

Commit

003911b

•

1 Parent(s): f88e96b

Update README.md

Files changed (1) hide show

README.md +9 -7

README.md CHANGED Viewed

@@ -8,13 +8,15 @@ This model is an improved version for Korean, based on the [Qwen2-72B-Instruct](
 ### LogicKor Benchmark (24.07.31)
 * The following benchmark ranks are based on 1-shot evaluation.
-| Rank | Model | Reasoning | Math  | Writing | Coding | Understanding | Grammar | Singleturn | Multiturn | Total |
-|------|-------|-----------|-------|--------|--------|-------|---------|-----------|-----------|-------|
-| 1    | openai/gpt-4o-2024-05-13 | 9.21 | 8.71 | 9.64 | 9.78 | 9.64 | 9.50 | 9.33 | 9.50 | 9.41 |
-| 2    | anthropic/claude-3-5-sonnet-20240620 | 8.64 | 8.42 | 9.85 | 9.78 | 9.92 | 9.21 | 9.26 | 9.35 | 9.30 |
-| 7    | denial07/Qwen2-72B-Instruct-kor-dpo | 8.85 | 8.21 | 9.14 | 9.71 | 9.64 | 7.21 | 8.88 | 8.71 | 8.79 |
-| 8    | Qwen/Qwen2-72B-Instruct | 8.00 | 8.14 | 9.07 | 9.85 | 9.78 | 7.28 | 8.61 | 8.76 | 8.69 |
-| 9    | google/gemini-1.5-pro-001 | 7.00 | 8.00 | 9.57 | 8.85 | 9.35 | 8.64 | 8.61 | 8.52 | 8.57 |
 ### KMMLU Benchmark (in progress)
 * [HAERAE-HUB/KMMLU](https://huggingface.co/datasets/HAERAE-HUB/KMMLU) benchmark score.

 ### LogicKor Benchmark (24.07.31)
 * The following benchmark ranks are based on 1-shot evaluation.
+| Rank | Model | Reasoning | Math  | Writing | Coding | Understanding | Grammar | Singleturn | Multiturn | Total | Parameters |
+|------|-------|-----------|-------|--------|--------|-------|---------|-----------|-----------|-------|---------|
+| 1    | openai/gpt-4o-2024-05-13 | 9.21 | 8.71 | 9.64 | 9.78 | 9.64 | 9.50 | 9.33 | 9.50 | 9.41 | ? |
+| 2    | anthropic/claude-3-5-sonnet-20240620 | 8.64 | 8.42 | 9.85 | 9.78 | 9.92 | 9.21 | 9.26 | 9.35 | 9.30 | ? |
+| 4    | mistralai/Mistral-Large-Instruct-2407 | 9.71 | 9.07 | 9.57 | 9.92 | 9.92 | 6.78 | 9.19 | 9.14 | 9.16 | 123B |
+| 8    | meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 | 8.78 | 7.14 | 9.28 | 9.64 | 9.64| 8.57 | 8.97 | 8.71 | 8.84 | 405B |
+| 9    | denial07/Qwen2-72B-Instruct-kor-dpo | 8.85 | 8.21 | 9.14 | 9.71 | 9.64 | 7.21 | 8.88 | 8.71 | 8.79 | 72B |
+| 10    | Qwen/Qwen2-72B-Instruct | 8.00 | 8.14 | 9.07 | 9.85 | 9.78 | 7.28 | 8.61 | 8.76 | 8.69 | 72B |
+| 11    | google/gemini-1.5-pro-001 | 7.00 | 8.00 | 9.57 | 8.85 | 9.35 | 8.64 | 8.61 | 8.52 | 8.57 | ? |
 ### KMMLU Benchmark (in progress)
 * [HAERAE-HUB/KMMLU](https://huggingface.co/datasets/HAERAE-HUB/KMMLU) benchmark score.