Tokenizer Choice For LLM Training: Negligible or Crucial? Paper • 2310.08754 • Published Oct 12, 2023 • 2
Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings Paper • 2202.06671 • Published Feb 14, 2022 • 2
Specialized Document Embeddings for Aspect-based Similarity of Research Papers Paper • 2203.14541 • Published Mar 28, 2022
Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning Paper • 2301.09626 • Published Jan 23, 2023 • 2
LowREm: A Repository of Word Embeddings for 87 Low-Resource Languages Enhanced with Multilingual Graph Knowledge Paper • 2409.18193 • Published Sep 26, 2024
Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages Paper • 2502.10140 • Published 23 days ago • 9
LowREm: A Repository of Word Embeddings for 87 Low-Resource Languages Enhanced with Multilingual Graph Knowledge Paper • 2409.18193 • Published Sep 26, 2024
Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages Paper • 2502.10140 • Published 23 days ago • 9
MSTS: A Multimodal Safety Test Suite for Vision-Language Models Paper • 2501.10057 • Published Jan 17 • 8
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps Paper • 2412.15035 • Published Dec 19, 2024 • 4
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps Paper • 2412.15035 • Published Dec 19, 2024 • 4