Align task name and type with Hub taxonomy
This PR proposes to align task name and type for the self-reported evaluation with the Hub taxonomy (i.e. the high-level tasks defined in hf.co/models)
The self-reported results will then become visible on this PwC leaderboard: https://paperswithcode.com/sota/summarization-on-samsum
cc @julien-c
why don't you just group all the metrics into the same (task, dataset) tuple, then? would be cleaner, no?
Yes it would be cleaner that way, but self-reported evaluations rarely specify the dataset config / split that was used. This means you can't group the verified and self-reported metrics under a single dataset
field.
A unique grouping would be something like (task, dataset_id, dataset_config, dataset_split)
- I'll double check if the metadata_update()
function from huggingface_hub
that we use automatically groups along those fields