|
--- |
|
license: bigscience-bloom-rail-1.0 |
|
datasets: |
|
- xnli |
|
language: |
|
- fr |
|
- en |
|
pipeline_tag: zero-shot-classification |
|
--- |
|
|
|
# Presentation |
|
We introduce the Bloomz-7b1-mt-NLI model, fine-tuned from the [Bloomz-7b1-mt-chat-dpo](https://huggingface.co/cmarkea/bloomz-7b1-mt-dpo-chat) foundation model. |
|
This model is trained on a Natural Language Inference (NLI) task in a language-agnostic manner. The NLI task involves determining the semantic relationship |
|
between a hypothesis and a set of premises, often expressed as pairs of sentences. |
|
|
|
The goal is to predict textual entailment (does sentence A imply/contradict/neither sentence B?) and is a classification task (given two sentences, predict one of the |
|
three labels). |
|
If sentence A is called *premise*, and sentence B is called *hypothesis*, then the goal of the modelization is to estimate the following: |
|
$$P(premise=c\in\{contradiction, entailment, neutral\}\vert hypothesis)$$ |
|
|
|
### Language-agnostic approach |
|
It should be noted that hypotheses and premises are randomly chosen between English and French, with each language combination representing a probability of 25%. |
|
|
|
### Performance |
|
|
|
| **class** | **precision (%)** | **f1-score (%)** | **support** | |
|
| :----------------: | :---------------: | :--------------: | :---------: | |
|
| **global** | 83.31 | 83.02 | 5,010 | |
|
| **contradiction** | 81.27 | 86.63 | 1,670 | |
|
| **entailment** | 87.54 | 83.57 | 1,670 | |
|
| **neutral** | 81.13 | 78.86 | 1,670 | |
|
|
|
### Benchmark |
|
|
|
Here are the performances for both the hypothesis and premise in French: |
|
|
|
| **model** | **accuracy (%)** | **MCC (x100)** | |
|
| :--------------: | :--------------: | :------------: | |
|
| [cmarkea/distilcamembert-base-nli](https://huggingface.co/cmarkea/distilcamembert-base-nli) | 77.45 | 66.24 | |
|
| [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 81.72 | 72.67 | |
|
| [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 83.43 | 75.15 | |
|
| [cmarkea/bloomz-560m-nli](https://huggingface.co/cmarkea/bloomz-560m-nli) | 68.70 | 53.57 | |
|
| [cmarkea/bloomz-3b-nli](https://huggingface.co/cmarkea/bloomz-3b-nli) | 81.08 | 71.66 | |
|
| [cmarkea/bloomz-7b1-mt-nli](https://huggingface.co/cmarkea/bloomz-7b1-mt-nli) | 83.13 | 74.89 | |
|
|
|
And now the hypothesis in French and the premise in English (cross-language context): |
|
|
|
| **model** | **accuracy (%)** | **MCC (x100)** | |
|
| :--------------: | :--------------: | :------------: | |
|
| [cmarkea/distilcamembert-base-nli](https://huggingface.co/cmarkea/distilcamembert-base-nli) | 16.89 | -26.82 | |
|
| [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 74.59 | 61.97 | |
|
| [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 85.15 | 77.74 | |
|
| [cmarkea/bloomz-560m-nli](https://huggingface.co/cmarkea/bloomz-560m-nli) | 68.84 | 53.55 | |
|
| [cmarkea/bloomz-3b-nli](https://huggingface.co/cmarkea/bloomz-3b-nli) | 82.12 | 73.22 | |
|
| [cmarkea/bloomz-7b1-mt-nli](https://huggingface.co/cmarkea/bloomz-7b1-mt-nli) | 85.43 | 78.25 | |
|
|
|
# Zero-shot Classification |
|
The primary interest of training such models lies in their zero-shot classification performance. This means that the model is able to classify any text with any label |
|
without a specific training. What sets the Bloomz-3b-NLI LLMs apart in this domain is their ability to model and extract information from significantly more complex |
|
and lengthy test structures compared to models like BERT, RoBERTa, or CamemBERT. |
|
|
|
The zero-shot classification task can be summarized by: |
|
$$P(hypothesis=i\in\mathcal{C}|premise)=\frac{e^{P(premise=entailment\vert hypothesis=i)}}{\sum_{j\in\mathcal{C}}e^{P(premise=entailment\vert hypothesis=j)}}$$ |
|
With *i* representing a hypothesis composed of a template (for example, "This text is about {}.") and *#C* candidate labels ("cinema", "politics", etc.), the set |
|
of hypotheses is composed of {"This text is about cinema.", "This text is about politics.", ...}. It is these hypotheses that we will measure against the premise, which |
|
is the sentence we aim to classify. |
|
|
|
### Performance |
|
|
|
The model is evaluated based on sentiment analysis evaluation on the French film review site [Allociné](https://huggingface.co/datasets/allocine). The dataset is labeled |
|
into 2 classes, positive comments and negative comments. We then use the hypothesis template "Ce commentaire est {}. and the candidate classes "positif" and "negatif". |
|
|
|
| **model** | **accuracy (%)** | **MCC (x100)** | |
|
| :--------------: | :--------------: | :------------: | |
|
| [cmarkea/distilcamembert-base-nli](https://huggingface.co/cmarkea/distilcamembert-base-nli) | 80.59 | 63.71 | |
|
| [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 86.37 | 73.74 | |
|
| [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 84.97 | 70.05 | |
|
| [cmarkea/bloomz-560m-nli](https://huggingface.co/cmarkea/bloomz-560m-nli) | 71.13 | 46.3 | |
|
| [cmarkea/bloomz-3b-nli](https://huggingface.co/cmarkea/bloomz-3b-nli) | 89.06 | 78.10 | |
|
| [cmarkea/bloomz-7b1-mt-nli](https://huggingface.co/cmarkea/bloomz-7b1-mt-nli) | 95.12 | 90.27 | |
|
|
|
# How to use Bloomz-7b1-mt-NLI |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
classifier = pipeline( |
|
task='zero-shot-classification', |
|
model="cmarkea/bloomz-7b1-mt-nli" |
|
) |
|
result = classifier ( |
|
sequences="Le style très cinéphile de Quentin Tarantino " |
|
"se reconnaît entre autres par sa narration postmoderne " |
|
"et non linéaire, ses dialogues travaillés souvent " |
|
"émaillés de références à la culture populaire, et ses " |
|
"scènes hautement esthétiques mais d'une violence " |
|
"extrême, inspirées de films d'exploitation, d'arts " |
|
"martiaux ou de western spaghetti.", |
|
candidate_labels="cinéma, technologie, littérature, politique", |
|
hypothesis_template="Ce texte parle de {}." |
|
) |
|
|
|
result |
|
{"labels": ["cinéma", |
|
"littérature", |
|
"technologie", |
|
"politique"], |
|
"scores": [0.8745610117912292, |
|
0.10403601825237274, |
|
0.014962797053158283, |
|
0.0064402492716908455]} |
|
|
|
# Resilience in cross-language French/English context |
|
result = classifier ( |
|
sequences="Quentin Tarantino's very cinephile style is " |
|
"recognized, among other things, by his postmodern and " |
|
"non-linear narration, his elaborate dialogues often " |
|
"peppered with references to popular culture, and his " |
|
"highly aesthetic but extremely violent scenes, inspired by " |
|
"exploitation films, martial arts or spaghetti western.", |
|
candidate_labels="cinéma, technologie, littérature, politique", |
|
hypothesis_template="Ce texte parle de {}." |
|
) |
|
|
|
result |
|
{"labels": ["cinéma", |
|
"littérature", |
|
"technologie", |
|
"politique"], |
|
"scores": [0.9314399361610413, |
|
0.04960821941494942, |
|
0.013468802906572819, |
|
0.005483036395162344]} |
|
``` |