26 12 304

Carlo Moro

cnmoro

https://www.linkedin.com/in/carlo-moro-4a20a7132/

cnmoro

AI & ML interests

I like small & fast models

Recent Activity

liked a model about 13 hours ago

CohereForAI/c4ai-command-a-03-2025

upvoted a collection 1 day ago

Gemma 3 Release

liked a model 1 day ago

google/gemma-3-1b-it

View all activity

Organizations

cnmoro's activity

liked a model about 13 hours ago

CohereForAI/c4ai-command-a-03-2025

Text Generation • Updated about 6 hours ago • 162

upvoted a collection 1 day ago

Gemma 3 Release

Collection

9 items • Updated about 5 hours ago • 221

liked a model 1 day ago

google/gemma-3-1b-it

Text Generation • Updated 1 day ago • 9.78k • 123

liked a model 2 days ago

RekaAI/reka-flash-3

Updated about 13 hours ago • 1.3k • 228

updated a model 2 days ago

cnmoro/TangledLlama33m-Reranker-EnPt-ONNX

Text Classification • Updated 2 days ago • 2

published a model 3 days ago

cnmoro/TangledLlama33m-Reranker-EnPt-ONNX

Text Classification • Updated 2 days ago • 2

updated a model 3 days ago

cnmoro/TangledLlama33m-Reranker-EnPt

Text Classification • Updated 3 days ago • 29 • 1

updated a dataset 3 days ago

cnmoro/smoltalk-555k-ptbr

Viewer • Updated 3 days ago • 556k • 74 • 1

reacted to tomaarsen's post with ❤️ 3 days ago

Post

6201

An assembly of 18 European companies, labs, and universities have banded together to launch 🇪🇺 EuroBERT! It's a state-of-the-art multilingual encoder for 15 European languages, designed to be finetuned for retrieval, classification, etc.

🇪🇺 15 Languages: English, French, German, Spanish, Chinese, Italian, Russian, Polish, Portuguese, Japanese, Vietnamese, Dutch, Arabic, Turkish, Hindi
3️⃣ 3 model sizes: 210M, 610M, and 2.1B parameters - very very useful sizes in my opinion
➡️ Sequence length of 8192 tokens! Nice to see these higher sequence lengths for encoders becoming more common.
⚙️ Architecture based on Llama, but with bi-directional (non-causal) attention to turn it into an encoder. Flash Attention 2 is supported.
🔥 A new Pareto frontier (stronger *and* smaller) for multilingual encoder models
📊 Evaluated against mDeBERTa, mGTE, XLM-RoBERTa for Retrieval, Classification, and Regression (after finetuning for each task separately): EuroBERT punches way above its weight.
📝 Detailed paper with all details, incl. data: FineWeb for English and CulturaX for multilingual data, The Stack v2 and Proof-Pile-2 for code.

Check out the release blogpost here: https://huggingface.co/blog/EuroBERT/release
* EuroBERT/EuroBERT-210m
* EuroBERT/EuroBERT-610m
* EuroBERT/EuroBERT-2.1B

The next step is for researchers to build upon the 3 EuroBERT base models and publish strong retrieval, zero-shot classification, etc. models for all to use. I'm very much looking forward to it!