PleIAs
/

Pleias-Nano

Model card Files Files and versions Community

Pclanglais commited on Dec 5, 2024

Commit

83a13d9

·

verified ·

1 Parent(s): 0248578

Update README.md

Files changed (1) hide show

README.md +18 -0

README.md CHANGED Viewed

@@ -60,6 +60,24 @@ A typical example, with excerpts drawn from a Wikipedia article on Wikipedia
 As a specialized language model, PleIAs-1.2b-RAG will be unable to work properly with prompts that detracts from that design.
 ## Acceptable use
 Pleias-nano-1.2b-RAG includes a much wider range of support for verifiability and grounding than most generalist models.

 As a specialized language model, PleIAs-1.2b-RAG will be unable to work properly with prompts that detracts from that design.
+### RAG Evaluation
+We evaluate Pico and Nano models on a RAG task. As existing benchmarks are largely limited to English, we develop a custom multilingual RAG benchmark. We synthetically generate queries and small sets of documents. To evaluate, we prompted models with the query and documents. We then ran a head-to-head ELO-based tournament with GPT-4o as judge. We [release the prompts and generations for all models we compared](https://huggingface.co/datasets/PleIAs/Pleias-1.0-eval/tree/main/RAGarena). Our nano (1.2B) model outperforms Llama 3.2 1.1B and EuroLLM 1.7B. Our pico (350M) model outperforms other models in its weight class, such as SmolLM 360M and Qwen2.5 500M, in addition to much larger models, such as Llama 3.2 1.1B and EuroLLM 1.7B.
+| **Rank** | **Model**                | **ELO**    |
+|----------|--------------------------|------------|
+| 1        | Qwen2.5-Instruct-7B      | 1294.6     |
+| 2        | Llama-3.2-Instruct-8B    | 1269.8     |
+| 3        | **Pleias-nano-1.2B-RAG**   | **1137.5** |
+| 4        | Llama-3.2-Instruct-3B    | 1118.1     |
+| 5        | Qwen2.5-Instruct-3B      | 1078.1     |
+| 6        | **Pleias-pico-350M-RAG** | **1051.2** |
+| 7        | Llama-3.2-1B-Instruct    | 872.3      |
+| 8        | EuroLLM-1.7B-Instruct    | 860.0      |
+| 9        | SmolLM-360M-Instruct     | 728.6      |
+| 10       | Qwen2.5-0.5B-Instruct    | 722.2      |
+| 11       | SmolLM-1.7B-Instruct     | 706.3      |
 ## Acceptable use
 Pleias-nano-1.2b-RAG includes a much wider range of support for verifiability and grounding than most generalist models.