Bohr commited on
Commit
7bce9e1
·
verified ·
1 Parent(s): 747e42f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -47,8 +47,7 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
47
 
48
  ## 🔍 Evaluation
49
 
50
- We used single-turn instructions from MT-Bench as input for Qwen2-1.5B-Instruct and Qwen2-7B-Instruct. GPT4-turbo is used to evaluate the changes in the level of detail and truthfulness of responses to our model's revised instructions.
51
-
52
 
53
  | Model | AlpacaEval 2.0 (length-controlled) | MT-Bench | MT-Bench (single) | IFEval (instruction-loose) | IFEval (strict-prompt) |
54
  |------|-----------------------------------|----------|-------------------|---------------------------|------------------------|
 
47
 
48
  ## 🔍 Evaluation
49
 
50
+ We evaluated our model on instruction-following leaderboards such as AlpacaEval, MT-Bench and IFEval.
 
51
 
52
  | Model | AlpacaEval 2.0 (length-controlled) | MT-Bench | MT-Bench (single) | IFEval (instruction-loose) | IFEval (strict-prompt) |
53
  |------|-----------------------------------|----------|-------------------|---------------------------|------------------------|