Cyrile commited on
Commit
51577f4
1 Parent(s): 86f72df

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -34,7 +34,7 @@ Sans honneur que précaire, sans liberté que provisoire, [...], et de façon qu
34
  | tokens per word | 2.3 | 2.3 | 2 | 1.9 | 1.9 | 1.9 | 1.8 | 1.4 |
35
 
36
 
37
- For comparison, with a specialized French tokenizer like [CamemBERT](https://huggingface.co/camembert/camembert-base) or [DistilCamemBERT](cmarkea/distilcamembert-base), we have 1.5 tokens per word. In addition to its positive impact on inference time and resource consumption, there has already been a demonstrated direct relationship between the number of tokens per word required for modeling and the predictive performance of the model [1].
38
 
39
  Dataset
40
  -------
@@ -72,7 +72,7 @@ Since the model is trained only on English and French corpora, the performance o
72
  |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
73
  | **human:** C'est quoi la première guerre mondiale ?<br/> **bot:** La première guerre mondiale est une guerre entre les États-Unis et la Grande-Bretagne.<br/> **human:** Tu es certain de ça ?<br/> **bot:** Oui, je suis sûr.<br/> **human:** Elle a eu lieu quand ?<br/> **bot:** La première guerre mondiale a eu lieu en 1914.<br/> **human:** Et elle a fini quand ?<br/> **bot:** La première guerre mondiale a eu lieu en 1918.<br/> **human:** Elle a opposé qui a qui ?<br/> **bot:** La première guerre mondiale a eu lieu entre les États-Unis et la Grande-Bretagne. | **human:** What is the first world war?<br/> **bot:** The first world war was a conflict between the United States and the Soviet Union. The conflict began in World War II and lasted until the end of World War III.<br/> **human:** Are you sure that this war was between these two belligerents and that it started during the Second World War?<br/> **bot:** Yes, the conflict was between the United States and the Soviet Union.<br/> **human:** What are the start and end dates of the first war?<br/> **bot:** The first war began in World War II and lasted until the end of World War III. | **human:** ¿Qué es la Primera Guerra Mundial?<br/> **bot:** La Primera Guerra Mundial fue una guerra entre los Estados Unidos y la Alemania, que se extendió por un total de 18 meses.<br/> **human:** ¿Solo estos dos países y por tan poco tiempo?<br/> **bot:** La Primera Guerra Mundial fue una guerra entre los Estados Unidos y la Alemania, que se extendió por un total de 18 meses. |
74
 
75
- Evaluating the performance of generative models can be challenging, and quantitative measures may not always provide the most relevant insights for this type of approach. In this regard, we conducted a qualitative performance study. It involves assessing the relevance of model responses to a pool of French questions using blind evaluation. The metric used for evaluation is the average of two criteria: response accuracy and completeness, similar to what the SSA metric aims to capture. Please note that this measure lacks rigorous scientific validity due to the limited number of questions in the test dataset and the fact that it involves only a few individuals with similar socio-demographic characteristics. The prompts take the same format for all models:
76
  ```
77
  [Instruction]
78
  Question : [Question]
 
34
  | tokens per word | 2.3 | 2.3 | 2 | 1.9 | 1.9 | 1.9 | 1.8 | 1.4 |
35
 
36
 
37
+ For comparison, with a specialized French tokenizer like [CamemBERT](https://huggingface.co/camembert/camembert-base) or [DistilCamemBERT](cmarkea/distilcamembert-base), we have 1.5 tokens per word. In addition to its positive impact on inference time and resource consumption, there has already been [shown that there is a direct relationship](https://arxiv.org/abs/2012.15613) between the number of tokens per word required for modeling and the predictive performance of the model.
38
 
39
  Dataset
40
  -------
 
72
  |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
73
  | **human:** C'est quoi la première guerre mondiale ?<br/> **bot:** La première guerre mondiale est une guerre entre les États-Unis et la Grande-Bretagne.<br/> **human:** Tu es certain de ça ?<br/> **bot:** Oui, je suis sûr.<br/> **human:** Elle a eu lieu quand ?<br/> **bot:** La première guerre mondiale a eu lieu en 1914.<br/> **human:** Et elle a fini quand ?<br/> **bot:** La première guerre mondiale a eu lieu en 1918.<br/> **human:** Elle a opposé qui a qui ?<br/> **bot:** La première guerre mondiale a eu lieu entre les États-Unis et la Grande-Bretagne. | **human:** What is the first world war?<br/> **bot:** The first world war was a conflict between the United States and the Soviet Union. The conflict began in World War II and lasted until the end of World War III.<br/> **human:** Are you sure that this war was between these two belligerents and that it started during the Second World War?<br/> **bot:** Yes, the conflict was between the United States and the Soviet Union.<br/> **human:** What are the start and end dates of the first war?<br/> **bot:** The first war began in World War II and lasted until the end of World War III. | **human:** ¿Qué es la Primera Guerra Mundial?<br/> **bot:** La Primera Guerra Mundial fue una guerra entre los Estados Unidos y la Alemania, que se extendió por un total de 18 meses.<br/> **human:** ¿Solo estos dos países y por tan poco tiempo?<br/> **bot:** La Primera Guerra Mundial fue una guerra entre los Estados Unidos y la Alemania, que se extendió por un total de 18 meses. |
74
 
75
+ Evaluating the performance of generative models can be challenging, and quantitative measures may not always provide the most relevant insights for this type of approach. In this regard, we conducted a qualitative performance study. It involves assessing the relevance of model responses to a pool of French questions using blind evaluation. The metric used for evaluation is the average of two criteria: response accuracy and completeness, similar to what the [SSA metric](https://arxiv.org/abs/2001.09977) aims to capture. Please note that this measure lacks rigorous scientific validity due to the limited number of questions in the test dataset and the fact that it involves only a few individuals with similar socio-demographic characteristics. The prompts take the same format for all models:
76
  ```
77
  [Instruction]
78
  Question : [Question]