Safetensors
llama
Pclanglais commited on
Commit
83a13d9
1 Parent(s): 0248578

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md CHANGED
@@ -60,6 +60,24 @@ A typical example, with excerpts drawn from a Wikipedia article on Wikipedia
60
 
61
  As a specialized language model, PleIAs-1.2b-RAG will be unable to work properly with prompts that detracts from that design.
62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  ## Acceptable use
64
  Pleias-nano-1.2b-RAG includes a much wider range of support for verifiability and grounding than most generalist models.
65
 
 
60
 
61
  As a specialized language model, PleIAs-1.2b-RAG will be unable to work properly with prompts that detracts from that design.
62
 
63
+ ### RAG Evaluation
64
+
65
+ We evaluate Pico and Nano models on a RAG task. As existing benchmarks are largely limited to English, we develop a custom multilingual RAG benchmark. We synthetically generate queries and small sets of documents. To evaluate, we prompted models with the query and documents. We then ran a head-to-head ELO-based tournament with GPT-4o as judge. We [release the prompts and generations for all models we compared](https://huggingface.co/datasets/PleIAs/Pleias-1.0-eval/tree/main/RAGarena). Our nano (1.2B) model outperforms Llama 3.2 1.1B and EuroLLM 1.7B. Our pico (350M) model outperforms other models in its weight class, such as SmolLM 360M and Qwen2.5 500M, in addition to much larger models, such as Llama 3.2 1.1B and EuroLLM 1.7B.
66
+
67
+ | **Rank** | **Model** | **ELO** |
68
+ |----------|--------------------------|------------|
69
+ | 1 | Qwen2.5-Instruct-7B | 1294.6 |
70
+ | 2 | Llama-3.2-Instruct-8B | 1269.8 |
71
+ | 3 | **Pleias-nano-1.2B-RAG** | **1137.5** |
72
+ | 4 | Llama-3.2-Instruct-3B | 1118.1 |
73
+ | 5 | Qwen2.5-Instruct-3B | 1078.1 |
74
+ | 6 | **Pleias-pico-350M-RAG** | **1051.2** |
75
+ | 7 | Llama-3.2-1B-Instruct | 872.3 |
76
+ | 8 | EuroLLM-1.7B-Instruct | 860.0 |
77
+ | 9 | SmolLM-360M-Instruct | 728.6 |
78
+ | 10 | Qwen2.5-0.5B-Instruct | 722.2 |
79
+ | 11 | SmolLM-1.7B-Instruct | 706.3 |
80
+
81
  ## Acceptable use
82
  Pleias-nano-1.2b-RAG includes a much wider range of support for verifiability and grounding than most generalist models.
83