gaudi/opus-mt-en-ln-ctranslate2 · ctranslate2 BLEU comparison to Marianmt fine-tuned model

Hey @Adeptschneider . I hope all is well on your end!

BLEU scores do tend to degrade when the checkpoint is quantized. In the CTranslate2 conversion command, a float 16 quantization is being applied via the "--quantization float16" flag. That being said, the degradation should typically only be about 1.0 point or so (e.g. based on some past issues raised in CTranslate2's Github Repo). If you're seeing greater degradation; I may have to look at the config files generated by CTranslate and see if there is anything to be tweaked there. The original model checkpoint can also be recompiled with different flags to maintain precision and potentially a better score. The command I used to compile the original checkpoint is the in the README (it was a command I found from one of michaelfeil's repos). That may be a good starting point to recompile the original checkpoint.

The BLEU scores in the readme are the generic scores listed on CTranslate2's Github Repository. Unfortunately, they're not specific to the model. If Dyula is a low-resource language, the BLEU scores may be much lower than what is posted there to start with. Do you know what the original checkpoint's BLEU score is on the data you're benchmarking with?

We currently aren't using these models in production. However, we are experimenting with several of them for production use-cases. Our challenge is in scale (volume of translation requests). We're trying to identify what is the most optimal solution for machine translation that balances fast inferencing performance, while still maintaining quality in translation (to some degree). I had set-up a pipeline that automatically pulls down the Opus models, converts them to ctranslate2 models, and then pushes them back up as a new HF repo. I was originally pushing these as private repos, but I figured it may be helpful for others to leverage them as well; hence the volume of repos. :)

I hope this provides at least some help! When I get the chance, I can pull down this checkpoint as well and see what I can identify! Hopefully some others in the HF community can also provide some insight here as well!