Questions about model & architecture
Hello!
I just stumbled upon this when looking at recent Sentence Transformer models, and I think it's quite interesting to see the custom architecture (although I haven't yet figured out what's new about it compared to e.g. RoBERTa). Would you like to share some information about it?
I also wanted to let you know that Sentence Transformers recently had a big v3.0 update, which refactored the training. Old training scripts should mostly still work, but training can now also be done with a SentenceTransformerTrainer
that resembles the transformers
Trainer
, in case you're familiar with that one. Notably, it's now much easier to track the performance of your model during training, via Weights and Biases/Tensorboard integrations and better callbacks. I think it might be quite useful for you. The updated training documentation can be found here: https://sbert.net/docs/sentence_transformer/training_overview.html
The produced model cards are also much more meaningful, see e.g. other recent Sentence Transformer models like cristuf/bge-base-financial-matryoshka.
Also, once your model is ready for people to use, then feel free to reach out and I can share the word on the socials.
cc @dangvantuan
- Tom Aarsen
Hi
@tomaarsen
I am using the XLMRoberta architecture but training it only for French and English, so I have customized it into a Bilingual model for these languages. The model is still in the experimental step. I am currently training NLI and will share it with you soon. I am also experimenting with Sentence Transformer v3.0.
Tuan
so I have customized it into a Bilingual model for these languages.
Out of curiosity, have you trained a custom tokenizer on English/French data? The XLM-R default tokenizer has a lot of tokens that you won't end up using that'll 1) slow down inference and 2) potentially reduce your performance.
I'm glad that you've discovered Sentence Transformers v3.0, I like to think that it can help make your life a bit easier.
I'll happily follow your progress along.
- Tom Aarsen