Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,134 @@
|
|
1 |
---
|
2 |
license: bigscience-bloom-rail-1.0
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: bigscience-bloom-rail-1.0
|
3 |
+
datasets:
|
4 |
+
- xnli
|
5 |
+
language:
|
6 |
+
- fr
|
7 |
+
- en
|
8 |
+
pipeline_tag: zero-shot-classification
|
9 |
---
|
10 |
+
|
11 |
+
# Presentation
|
12 |
+
We introduce the Bloomz-7b1-mt-NLI model, fine-tuned from the [Bloomz-7b1-mt-chat-dpo](https://huggingface.co/cmarkea/bloomz-7b1-mt-dpo-chat) foundation model.
|
13 |
+
This model is trained on a Natural Language Inference (NLI) task in a language-agnostic manner. The NLI task involves determining the semantic relationship
|
14 |
+
between a hypothesis and a set of premises, often expressed as pairs of sentences.
|
15 |
+
|
16 |
+
The goal is to predict textual entailment (does sentence A imply/contradict/neither sentence B?) and is a classification task (given two sentences, predict one of the
|
17 |
+
three labels).
|
18 |
+
If sentence A is called *premise*, and sentence B is called *hypothesis*, then the goal of the modelization is to estimate the following:
|
19 |
+
$$P(premise=c\in\{contradiction, entailment, neutral\}\vert hypothesis)$$
|
20 |
+
|
21 |
+
### Language-agnostic approach
|
22 |
+
It should be noted that hypotheses and premises are randomly chosen between English and French, with each language combination representing a probability of 25%.
|
23 |
+
|
24 |
+
### Performance
|
25 |
+
|
26 |
+
| **class** | **precision (%)** | **f1-score (%)** | **support** |
|
27 |
+
| :----------------: | :---------------: | :--------------: | :---------: |
|
28 |
+
| **global** | 83.31 | 83.02 | 5,010 |
|
29 |
+
| **contradiction** | 81.27 | 86.63 | 1,670 |
|
30 |
+
| **entailment** | 87.54 | 83.57 | 1,670 |
|
31 |
+
| **neutral** | 81.13 | 78.86 | 1,670 |
|
32 |
+
|
33 |
+
### Benchmark
|
34 |
+
|
35 |
+
Here are the performances for both the hypothesis and premise in French:
|
36 |
+
|
37 |
+
| **model** | **accuracy (%)** | **MCC (x100)** |
|
38 |
+
| :--------------: | :--------------: | :------------: |
|
39 |
+
| [cmarkea/distilcamembert-base-nli](https://huggingface.co/cmarkea/distilcamembert-base-nli) | 77.45 | 66.24 |
|
40 |
+
| [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 81.72 | 72.67 |
|
41 |
+
| [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 83.43 | 75.15 |
|
42 |
+
| [cmarkea/bloomz-560m-nli](https://huggingface.co/cmarkea/bloomz-560m-nli) | 68.70 | 53.57 |
|
43 |
+
| [cmarkea/bloomz-3b-nli](https://huggingface.co/cmarkea/bloomz-3b-nli) | 81.08 | 71.66 |
|
44 |
+
| [cmarkea/bloomz-7b1-mt-nli](https://huggingface.co/cmarkea/bloomz-7b1-mt-nli) | 83.13 | 74.89 |
|
45 |
+
|
46 |
+
And now the hypothesis in French and the premise in English (cross-language context):
|
47 |
+
|
48 |
+
| **model** | **accuracy (%)** | **MCC (x100)** |
|
49 |
+
| :--------------: | :--------------: | :------------: |
|
50 |
+
| [cmarkea/distilcamembert-base-nli](https://huggingface.co/cmarkea/distilcamembert-base-nli) | 16.89 | -26.82 |
|
51 |
+
| [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 74.59 | 61.97 |
|
52 |
+
| [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 85.15 | 77.74 |
|
53 |
+
| [cmarkea/bloomz-560m-nli](https://huggingface.co/cmarkea/bloomz-560m-nli) | 68.84 | 53.55 |
|
54 |
+
| [cmarkea/bloomz-3b-nli](https://huggingface.co/cmarkea/bloomz-3b-nli) | 82.12 | 73.22 |
|
55 |
+
| [cmarkea/bloomz-7b1-mt-nli](https://huggingface.co/cmarkea/bloomz-7b1-mt-nli) | 85.43 | 78.25 |
|
56 |
+
|
57 |
+
# Zero-shot Classification
|
58 |
+
The primary interest of training such models lies in their zero-shot classification performance. This means that the model is able to classify any text with any label
|
59 |
+
without a specific training. What sets the Bloomz-3b-NLI LLMs apart in this domain is their ability to model and extract information from significantly more complex
|
60 |
+
and lengthy test structures compared to models like BERT, RoBERTa, or CamemBERT.
|
61 |
+
|
62 |
+
The zero-shot classification task can be summarized by:
|
63 |
+
$$P(hypothesis=i\in\mathcal{C}|premise)=\frac{e^{P(premise=entailment\vert hypothesis=i)}}{\sum_{j\in\mathcal{C}}e^{P(premise=entailment\vert hypothesis=j)}}$$
|
64 |
+
With *i* representing a hypothesis composed of a template (for example, "This text is about {}.") and *#C* candidate labels ("cinema", "politics", etc.), the set
|
65 |
+
of hypotheses is composed of {"This text is about cinema.", "This text is about politics.", ...}. It is these hypotheses that we will measure against the premise, which
|
66 |
+
is the sentence we aim to classify.
|
67 |
+
|
68 |
+
### Performance
|
69 |
+
|
70 |
+
The model is evaluated based on sentiment analysis evaluation on the French film review site [Allociné](https://huggingface.co/datasets/allocine). The dataset is labeled
|
71 |
+
into 2 classes, positive comments and negative comments. We then use the hypothesis template "Ce commentaire est {}. and the candidate classes "positif" and "negatif".
|
72 |
+
|
73 |
+
| **model** | **accuracy (%)** | **MCC (x100)** |
|
74 |
+
| :--------------: | :--------------: | :------------: |
|
75 |
+
| [cmarkea/distilcamembert-base-nli](https://huggingface.co/cmarkea/distilcamembert-base-nli) | 80.59 | 63.71 |
|
76 |
+
| [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 86.37 | 73.74 |
|
77 |
+
| [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 84.97 | 70.05 |
|
78 |
+
| [cmarkea/bloomz-560m-nli](https://huggingface.co/cmarkea/bloomz-560m-nli) | 71.13 | 46.3 |
|
79 |
+
| [cmarkea/bloomz-3b-nli](https://huggingface.co/cmarkea/bloomz-3b-nli) | 89.06 | 78.10 |
|
80 |
+
| [cmarkea/bloomz-7b1-mt-nli](https://huggingface.co/cmarkea/bloomz-7b1-mt-nli) | 95.12 | 90.27 |
|
81 |
+
|
82 |
+
# How to use Bloomz-7b1-mt-NLI
|
83 |
+
|
84 |
+
```python
|
85 |
+
from transformers import pipeline
|
86 |
+
|
87 |
+
classifier = pipeline(
|
88 |
+
task='zero-shot-classification',
|
89 |
+
model="cmarkea/bloomz-7b1-mt-nli"
|
90 |
+
)
|
91 |
+
result = classifier (
|
92 |
+
sequences="Le style très cinéphile de Quentin Tarantino "
|
93 |
+
"se reconnaît entre autres par sa narration postmoderne "
|
94 |
+
"et non linéaire, ses dialogues travaillés souvent "
|
95 |
+
"émaillés de références à la culture populaire, et ses "
|
96 |
+
"scènes hautement esthétiques mais d'une violence "
|
97 |
+
"extrême, inspirées de films d'exploitation, d'arts "
|
98 |
+
"martiaux ou de western spaghetti.",
|
99 |
+
candidate_labels="cinéma, technologie, littérature, politique",
|
100 |
+
hypothesis_template="Ce texte parle de {}."
|
101 |
+
)
|
102 |
+
|
103 |
+
result
|
104 |
+
{"labels": ["cinéma",
|
105 |
+
"littérature",
|
106 |
+
"technologie",
|
107 |
+
"politique"],
|
108 |
+
"scores": [0.8745610117912292,
|
109 |
+
0.10403601825237274,
|
110 |
+
0.014962797053158283,
|
111 |
+
0.0064402492716908455]}
|
112 |
+
|
113 |
+
# Resilience in cross-language French/English context
|
114 |
+
result = classifier (
|
115 |
+
sequences="Quentin Tarantino's very cinephile style is "
|
116 |
+
"recognized, among other things, by his postmodern and "
|
117 |
+
"non-linear narration, his elaborate dialogues often "
|
118 |
+
"peppered with references to popular culture, and his "
|
119 |
+
"highly aesthetic but extremely violent scenes, inspired by "
|
120 |
+
"exploitation films, martial arts or spaghetti western.",
|
121 |
+
candidate_labels="cinéma, technologie, littérature, politique",
|
122 |
+
hypothesis_template="Ce texte parle de {}."
|
123 |
+
)
|
124 |
+
|
125 |
+
result
|
126 |
+
{"labels": ["cinéma",
|
127 |
+
"littérature",
|
128 |
+
"technologie",
|
129 |
+
"politique"],
|
130 |
+
"scores": [0.9314399361610413,
|
131 |
+
0.04960821941494942,
|
132 |
+
0.013468802906572819,
|
133 |
+
0.005483036395162344]}
|
134 |
+
```
|