pavlichenko commited on
Commit
3677ce0
β€’
1 Parent(s): 8ebb2ea

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +2 -2
app.py CHANGED
@@ -39,7 +39,7 @@ Distribution of prompts by categories:
39
  We report win rates only on categories where the number of prompts is large enough to make a comparison fair.
40
 
41
 
42
- #### How Did We Set Up Human Evaluation
43
 
44
  Annotators on Toloka crowdsourcing platform are given a prompt and responses to this prompt from two different models: the reference model and a model that we evaluate. Annotators then choose the best response according to harmlessness, truthfulness, and helpfulness. In simple words, we follow the Alpaca Eval scheme but instead of GPT-4, we use real humans as annotators.
45
  """
@@ -102,7 +102,7 @@ row = [reference_model_name] + [50.0] * len(pretty_categories)
102
  table = pd.concat([table, pd.DataFrame([pd.Series(row, index=table.columns)])], ignore_index=True)
103
  table = table.sort_values(by=['Total'], ascending=False)
104
 
105
- table.index = range(1, len(table) + 1)
106
 
107
  for category in pretty_category_names.values():
108
  table[category] = table[category].map('{:,.2f}%'.format)
 
39
  We report win rates only on categories where the number of prompts is large enough to make a comparison fair.
40
 
41
 
42
+ #### How Did We Set Up Human Evaluation?
43
 
44
  Annotators on Toloka crowdsourcing platform are given a prompt and responses to this prompt from two different models: the reference model and a model that we evaluate. Annotators then choose the best response according to harmlessness, truthfulness, and helpfulness. In simple words, we follow the Alpaca Eval scheme but instead of GPT-4, we use real humans as annotators.
45
  """
 
102
  table = pd.concat([table, pd.DataFrame([pd.Series(row, index=table.columns)])], ignore_index=True)
103
  table = table.sort_values(by=['Total'], ascending=False)
104
 
105
+ table.index = ["πŸ₯‡ 1", "πŸ₯ˆ 2", "πŸ₯‰ 3"] + list(range(4, len(table) + 1))
106
 
107
  for category in pretty_category_names.values():
108
  table[category] = table[category].map('{:,.2f}%'.format)