pere commited on
Commit
81f8cc1
·
1 Parent(s): 60c98b2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -8
README.md CHANGED
@@ -15,8 +15,9 @@ datasets:
15
  - xnli
16
  pipeline_tag: zero-shot-classification
17
  widget:
18
- - text: "Såframt uforutsette ting ikke inntreffer, kommer statsministeren til å presentere en plan for gjenåpningen av samfunnet før påske. Det bekrefter hun overfor Dagbladet."
19
- candidate_labels: "politikk, helse, sport, religion, geografi"
 
20
  multi_class: true
21
  ---
22
 
@@ -25,15 +26,34 @@ widget:
25
  # NB-Bert base model finetuned on Norwegian machine translated MNLI
26
 
27
  ## Description
28
-
29
- This finetuned model was created to show that it is possible to use a MNLI finetuned model for zero-shot classification. It is an alternative to finetuning a model on a specific annotated dataset.
30
-
31
- ## Intended use & limitations
32
-
33
- ## Training data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  ## More information
36
 
37
  For more information on the model, see
38
 
39
  https://github.com/NBAiLab/notram
 
 
 
15
  - xnli
16
  pipeline_tag: zero-shot-classification
17
  widget:
18
+ - text: "Folkehelseinstituttets mest optimistiske anslag er at alle over 18 år er ferdigvaksinert innen midten av september."
19
+ candidate_labels: "politikk, helse, sport, religion"
20
+ hypothesis_template = "Denne teksten handler om {}."
21
  multi_class: true
22
  ---
23
 
 
26
  # NB-Bert base model finetuned on Norwegian machine translated MNLI
27
 
28
  ## Description
29
+ The most effective way of creating a good classifier is to finetune it for this specific task. However, in many cases this is simply impossible.
30
+ [Yin et al.](https://arxiv.org/abs/1909.00161) has proposed a very clever way of using pre-trained MNLI model as a zero-shot sequence classifiers. The methods works by reformulating the question to an MNLI hypothesis. If we want to figure out if a text is about "sport", we simply state that "This text is about sport" ("Denne teksten handler om sport").
31
+
32
+ When the model is finetuned on the 400k large MNLI task, it is in many cases able to solve this classification tasks. There are no MNLI-set of this size in Norwegian but we have trained it on a machine translated version of the original MNLI-set.
33
+
34
+ ## Hugging Face zero-shot-classification pipeline
35
+ The easiest way to try this out is using the Hugging Face pipeline. Please note that you will improve the results by overriding the English hypothesis template.
36
+ ```python
37
+ from transformers import pipeline
38
+ classifier = pipeline("zero-shot-classification", model="NBAiLab/nb-bert-base-mnli")
39
+ ```
40
+ You can then use this pipeline to classify sequences into any of the class names you specify.
41
+ ```python
42
+ sequence_to_classify = "Folkehelseinstituttets mest optimistiske anslag er at alle over 18 år er ferdigvaksinert innen midten av september."
43
+ candidate_labels = ["politikk, helse, sport, religion"]
44
+ hypothesis_template = "Denne teksten handler om {}."
45
+ classifier(sequence_to_classify, candidate_labels, hypothesis_template=hypothesis_template, multi_class=True)
46
+
47
+
48
+ #{'labels': ['travel', 'dancing', 'cooking'],
49
+ # 'scores': [0.9938651323318481, 0.0032737774308770895, 0.002861034357920289],
50
+ # 'sequence': 'one day I will see the world'}
51
+ ```
52
 
53
  ## More information
54
 
55
  For more information on the model, see
56
 
57
  https://github.com/NBAiLab/notram
58
+
59
+ Here you will also find a Colab explaining more in details how to use the zero-shot-classification pipeline.