RLHFlow
/

LLaMA3-SFT

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

weqweasdas commited on Nov 3

Commit

1bedbf1

•

1 Parent(s): 465c80f

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -14,14 +14,15 @@ The model is trained from [meta-llama/Meta-Llama-3-8B](https://huggingface.co/me
 ## Academic Benchmarks
 | **Model**                  | **Size** | **Method**      | **LC AlpacaEval** | **MT-Bench** | **GSM-8K** | **MMLU** | **HumanEval** | **TruthfulQA** | **ARC** | **MBPP** |
 |----------------------------|----------|-----------------|------------|------------|------------|----------|---------------|----------------|---------|----------|
 | LLaMA-3-8B-it              | 8B       | RS+DPO+PPO      |22.9|8.16| 79.6       | 66.0     | 61.6          | 43.9           | 59.5    | 61.1     |
-| Ours (SFT baseline)        | 8B       | SFT             |10.2|7.69| 74.2       | 64.7     | 65.2          | 53.4           | 61.4    | 62.3     |
 | Ours (Iterative RLHF)      | 8B       | Iterative DPO   |37.2|8.46| 80.7       | 65.3     | 64.6          | 60.4           | 64.3    | 60.8     |
 ## Citation
 Please cite our techical report if you find our model is useful for your research or product.
 ```

 ## Academic Benchmarks
+We use ToRA script to evaluate GSM8K and MATH, Evalplut for HumanEval, and lm-evaluation-harness for other benchmarks. The model is evaluated in zero-shot setting so the results here may be slightly different from that reported in the technical report.
 | **Model**                  | **Size** | **Method**      | **LC AlpacaEval** | **MT-Bench** | **GSM-8K** | **MMLU** | **HumanEval** | **TruthfulQA** | **ARC** | **MBPP** |
 |----------------------------|----------|-----------------|------------|------------|------------|----------|---------------|----------------|---------|----------|
 | LLaMA-3-8B-it              | 8B       | RS+DPO+PPO      |22.9|8.16| 79.6       | 66.0     | 61.6          | 43.9           | 59.5    | 61.1     |
+| Ours (SFT baseline)        | 8B       | SFT             |10.2|7.69| 74.2  |   30.0  | 64.6     | 63.4          | 53.5           | 58.6    |
 | Ours (Iterative RLHF)      | 8B       | Iterative DPO   |37.2|8.46| 80.7       | 65.3     | 64.6          | 60.4           | 64.3    | 60.8     |
 ## Citation
 Please cite our techical report if you find our model is useful for your research or product.
 ```