huggingface_hub jsonlines pandas gradio datasets tqdm # CMG metrics evaluate bert-score sacrebleu rouge-score