cmarkea
/

distilcamembert-base-nli

@@ -23,7 +23,7 @@ This modelization is close to [BaptisteDoyen/camembert-base-xnli](https://huggin
 Dataset
 -------
-The dataset XNLI from [FLUE](https://huggingface.co/datasets/flue) is composed of 392,702 premises with their hypothesis for the train and 5,010 couples for the test. The goal is to predict textual entailment (does sentence A imply/contradict/neither sentence B) and is a classification task (given two sentences, predict one of three labels). The sentence A is called *premise* and sentence B is called *hypothesis*, then the goal of modelization is determined :
 $$P(premise\in\{contradiction, entailment, neutral\}\vert hypothesis)$$
 Evaluation results
@@ -39,7 +39,7 @@ Evaluation results
 Benchmark
 ---------
-We compare the [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base) model with 2 others modelization works on French language. The first [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) is based on well named [CamemBERT](https://huggingface.co/camembert-base), the frech RnoBETa model and the third [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) based on [mDeBERTav3](https://huggingface.co/microsoft/mdeberta-v3-base) a multilingue model. To compare the performances the metric [MCC (Matthews Correlation Coefficient)](https://en.wikipedia.org/wiki/Phi_coefficient) was used and for the mean inference time measure, an **AMD Ryzen 5 4500U @ 2.3GHz with 6 cores** was used:
 | **NLI**          | **time (ms)** | **MCC (x100)** |
 | :--------------: | :-----------: | :------------: |
@@ -47,7 +47,7 @@ We compare the [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-
 | [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 105.0              | 72.67         |
 | [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 299.18 | 75.15         |
 $$P(hypothesis=c|premise)=\frac{e^{P(premise=entailment\vert hypothesis\; c)}}{\sum_{i\in\mathcal{C}}e^{P(premise=entailment\vert hypothesis\; i)}}$$
 | **Allociné**     | **time (ms)** | **MCC (x100)** |
@@ -77,16 +77,12 @@ result = classifier (
     hypothesis_template="Ce texte parle de {}."
 )
 result
-{"labels": [
-    "cinéma",
-    "technologie",
-    "littérature",
-    "politique"
-  ],
-  "scores": [
-    0.5172086954116821,
-    0.2278652936220169,
-    0.17426978051662445,
-    0.08065623790025711
-  ]}
 ```

 Dataset
 -------
+The dataset XNLI from [FLUE](https://huggingface.co/datasets/flue) is composed of 392,702 premises with their hypothesis for the train and 5,010 couples for the test. The goal is to predict textual entailment (does sentence A imply/contradict/neither sentence B?) and is a classification task (given two sentences, predict one of three labels). The sentence A is called *premise* and sentence B is called *hypothesis*, then the goal of modelization is determined as follows:
 $$P(premise\in\{contradiction, entailment, neutral\}\vert hypothesis)$$
 Evaluation results
 Benchmark
 ---------
+We compare the [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base) model with 2 other modelizations working on french language. The first one [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) is based on well named [CamemBERT](https://huggingface.co/camembert-base), the french RoBERTa model and the second one [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) based on [mDeBERTav3](https://huggingface.co/microsoft/mdeberta-v3-base) a multilingual model. To compare the performances the metric [MCC (Matthews Correlation Coefficient)](https://en.wikipedia.org/wiki/Phi_coefficient) was used and for the mean inference time measure, an **AMD Ryzen 5 4500U @ 2.3GHz with 6 cores** was used:
 | **NLI**          | **time (ms)** | **MCC (x100)** |
 | :--------------: | :-----------: | :------------: |
 | [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 105.0              | 72.67         |
 | [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 299.18 | 75.15         |
+The main advantage of such modelization is to create a zero-shot classifier allowing text classification without training. This task can be summarized by:
 $$P(hypothesis=c|premise)=\frac{e^{P(premise=entailment\vert hypothesis\; c)}}{\sum_{i\in\mathcal{C}}e^{P(premise=entailment\vert hypothesis\; i)}}$$
 | **Allociné**     | **time (ms)** | **MCC (x100)** |
     hypothesis_template="Ce texte parle de {}."
 )
 result
+{"labels": ["cinéma",
+            "technologie",
+            "littérature",
+            "politique"],
+ "scores": [0.5172086954116821,
+            0.2278652936220169,
+            0.17426978051662445,
+            0.08065623790025711]}
 ```