Ali-C137 commited on
Commit
64a51d7
1 Parent(s): 95c28dd

Update src/about.py

Browse files
Files changed (1) hide show
  1. src/about.py +1 -0
src/about.py CHANGED
@@ -93,6 +93,7 @@ And here find all the translated benchmarks provided by the Language evaluation
93
 
94
 
95
  To ensure a fair and unbiased assessment of the models' true capabilities, all evaluations are conducted in zero-shot settings `0-shots`. This approach eliminates any potential advantage from task-specific fine-tuning, providing a clear indication of how well the models can generalize to new tasks.
 
96
  Also, given the nature of the tasks, which include multiple-choice and yes/no questions, the leaderboard primarily uses normalized log likelihood accuracy `loglikelihood_acc_norm` for all tasks. This metric was chosen for its ability to provide a clear and fair measurement of model performance across different types of questions.
97
 
98
 
 
93
 
94
 
95
  To ensure a fair and unbiased assessment of the models' true capabilities, all evaluations are conducted in zero-shot settings `0-shots`. This approach eliminates any potential advantage from task-specific fine-tuning, providing a clear indication of how well the models can generalize to new tasks.
96
+
97
  Also, given the nature of the tasks, which include multiple-choice and yes/no questions, the leaderboard primarily uses normalized log likelihood accuracy `loglikelihood_acc_norm` for all tasks. This metric was chosen for its ability to provide a clear and fair measurement of model performance across different types of questions.
98
 
99