Spaces:
Runtime error
Availability to evaluate LLMs like in the HF blog post
Hello, I just saw thi blog post https://huggingface.co/blog/zero-shot-eval-on-the-hub which I am really excited about! I wanted to ask when the example in the blog post will be available to access in the HF leader boards?
Hi @sjrlee ! Great feature request idea - gently pinging @Tristan to add this task to the leaderboards so people can view the scores from the LLM evaluations on https://huggingface.co/spaces/autoevaluate/leaderboards?dataset=-any-
Hi @sjrlee you can find the evaluations coming from the zeroshot pipeline under the text-generation
task on the leaderboards, e.g. https://huggingface.co/spaces/autoevaluate/leaderboards?dataset=mathemakitten%2Fwinobias_antistereotype_test
Feel free to close this issue if that addresses your query
Thanks, @lewtun ! That is great to see. Before I close this I wanted to ask if there are plans to add the results of all the model sizes shown in the blog post?