File size: 412 Bytes
df568a0 |
1 2 3 4 5 6 7 8 |
# Model
mMiniLM-L12xH384 XLM-R model proposed in [MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers](https://arxiv.org/abs/2012.15828) that we fine-tune using the direct assessment annotations collected in the Workshop on Statistical Machine Translation (WMT) 2015 to 2020.
This model is much more light weight than the traditional XLM-RoBERTa base and large.
|