jieliu commited on
Commit
26cce27
1 Parent(s): 1d7cdbc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -81,6 +81,16 @@ response_text = generate_response(input_prompt)
81
  print("Response:", response_text)
82
  ```
83
 
 
 
 
 
 
 
 
 
 
 
84
  ## Limitations
85
 
86
  Storm-7B is a quick demonstration that a language model, fine-tuned with AI feedback, can easily surpass or match state-of-the-art models, as assessed by the same AI feedback. However, this improvement on the automatic leaderboard may not necessarily indicate better alignment with human intentions. Our model therefore represents a critical, preliminary reevaluation of the RLAIF paradigm, questioning how much learning from and being evaluated by AI feedback aligns with actual human preferences.
 
81
  print("Response:", response_text)
82
  ```
83
 
84
+ ## Scripts
85
+ You can reproduce our results on AlphaEval 2.0 using the script provided below.
86
+ ```bash
87
+ git clone https://github.com/tatsu-lab/alpaca_eval.git
88
+ cd alpaca_eval
89
+ pip install -e .
90
+ export OPENAI_API_KEY=<your_api_key>
91
+ alpaca_eval evaluate_from_model --model_configs 'Storm-7B'
92
+ ```
93
+
94
  ## Limitations
95
 
96
  Storm-7B is a quick demonstration that a language model, fine-tuned with AI feedback, can easily surpass or match state-of-the-art models, as assessed by the same AI feedback. However, this improvement on the automatic leaderboard may not necessarily indicate better alignment with human intentions. Our model therefore represents a critical, preliminary reevaluation of the RLAIF paradigm, questioning how much learning from and being evaluated by AI feedback aligns with actual human preferences.