Pclanglais
commited on
Commit
•
83a13d9
1
Parent(s):
0248578
Update README.md
Browse files
README.md
CHANGED
@@ -60,6 +60,24 @@ A typical example, with excerpts drawn from a Wikipedia article on Wikipedia
|
|
60 |
|
61 |
As a specialized language model, PleIAs-1.2b-RAG will be unable to work properly with prompts that detracts from that design.
|
62 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
63 |
## Acceptable use
|
64 |
Pleias-nano-1.2b-RAG includes a much wider range of support for verifiability and grounding than most generalist models.
|
65 |
|
|
|
60 |
|
61 |
As a specialized language model, PleIAs-1.2b-RAG will be unable to work properly with prompts that detracts from that design.
|
62 |
|
63 |
+
### RAG Evaluation
|
64 |
+
|
65 |
+
We evaluate Pico and Nano models on a RAG task. As existing benchmarks are largely limited to English, we develop a custom multilingual RAG benchmark. We synthetically generate queries and small sets of documents. To evaluate, we prompted models with the query and documents. We then ran a head-to-head ELO-based tournament with GPT-4o as judge. We [release the prompts and generations for all models we compared](https://huggingface.co/datasets/PleIAs/Pleias-1.0-eval/tree/main/RAGarena). Our nano (1.2B) model outperforms Llama 3.2 1.1B and EuroLLM 1.7B. Our pico (350M) model outperforms other models in its weight class, such as SmolLM 360M and Qwen2.5 500M, in addition to much larger models, such as Llama 3.2 1.1B and EuroLLM 1.7B.
|
66 |
+
|
67 |
+
| **Rank** | **Model** | **ELO** |
|
68 |
+
|----------|--------------------------|------------|
|
69 |
+
| 1 | Qwen2.5-Instruct-7B | 1294.6 |
|
70 |
+
| 2 | Llama-3.2-Instruct-8B | 1269.8 |
|
71 |
+
| 3 | **Pleias-nano-1.2B-RAG** | **1137.5** |
|
72 |
+
| 4 | Llama-3.2-Instruct-3B | 1118.1 |
|
73 |
+
| 5 | Qwen2.5-Instruct-3B | 1078.1 |
|
74 |
+
| 6 | **Pleias-pico-350M-RAG** | **1051.2** |
|
75 |
+
| 7 | Llama-3.2-1B-Instruct | 872.3 |
|
76 |
+
| 8 | EuroLLM-1.7B-Instruct | 860.0 |
|
77 |
+
| 9 | SmolLM-360M-Instruct | 728.6 |
|
78 |
+
| 10 | Qwen2.5-0.5B-Instruct | 722.2 |
|
79 |
+
| 11 | SmolLM-1.7B-Instruct | 706.3 |
|
80 |
+
|
81 |
## Acceptable use
|
82 |
Pleias-nano-1.2b-RAG includes a much wider range of support for verifiability and grounding than most generalist models.
|
83 |
|