occiglot/euro-llm-leaderboard-requests
Preview
•
Updated
•
1.84k
•
2
Open Source Language Models for Europe
scandeval
) to run your own benchmarks with. As part of project "Leesplank" (with Michiel Buisman and Maarten Lens-FitzGerald) we recently added GPT-4-1106-preview scores to add a good "target" to the leaderboard.load_dataset("BramVanroy/hplt_mono_v1_2", "nl_cleaned")
/*****/
). There were some useful brainstorms in that thread. I think the dataset is relatively easy for the model, leading it to quickly overfit when the beta is very small, which allows the model to step away further from its initially outputs.