morit commited on
Commit
28bac8e
1 Parent(s): fd69737

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -0
README.md CHANGED
@@ -1,3 +1,48 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ datasets:
4
+ - xnli
5
+ language:
6
+ - de
7
+ metrics:
8
+ - accuracy
9
+ pipeline_tag: zero-shot-classification
10
  ---
11
+
12
+ # XLM-ROBERTA-BASE-XNLI
13
+
14
+ ## Model description
15
+ This model takes the XLM-Roberta-base model which has been continued to pre-traine on a large corpus of Twitter in multiple languages.
16
+ It was developed following a similar strategy as introduced as part of the [Tweet Eval](https://github.com/cardiffnlp/tweeteval) framework.
17
+ The model is further finetuned on all of the languages of the XNLI train set
18
+
19
+ ## Intended Usage
20
+
21
+ This model was developed to do Zero-Shot Text Classification in the realm of Hate Speech Detection. It is finetuned on the whole xnli train set containing 15 different languages like:
22
+ **ar, bg ,de , en, el , es, fr, hi, ru, sw, th, tr, ur, vi, zh**
23
+ Since the base model was pre-trained on 100 different languages it has shown some effectiveness in other languages. Please refer to the list of languages in the [XLM Roberta paper](https://arxiv.org/abs/1911.02116)
24
+
25
+ ### Usage with Zero-Shot Classification pipeline
26
+ ```python
27
+ from transformers import pipeline
28
+ classifier = pipeline("zero-shot-classification",
29
+ model="morit/XLM-T-full-xnli")
30
+ ```
31
+
32
+ ## Training
33
+ This model was pre-trained on a set of 100 languages and follwed further training on 198M multilingual tweets as described in the original [paper](https://arxiv.org/abs/2104.12250). Further it was trained on the full train set of XNLI dataset which is a machine translated version of the MNLI dataset. It was trained on 5 epochs of the XNLI train set and evaluated on the XNLI eval dataset at the end of every epoch to find the best performing model. The model which had the highest accuracy on the eval set was chosen at the end.
34
+
35
+ ![Training Charts from wandb](screen_wandb.png)
36
+ - learning rate: 2e-5
37
+ - batch size: 32
38
+ - max sequence: length 128
39
+
40
+ using a GPU (NVIDIA GeForce RTX 3090)
41
+
42
+ # Evaluation
43
+ The model was evaluated on all the test sets of the xnli dataset resulting in the following accuracies:
44
+
45
+ | ar | bg | de | en | el | es | fr | hi| ru | sw | th | tr |ur | vi | zh |
46
+ |-----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
47
+ | 0.749 | 0.787 | 0.774 | 0.774 | 0.831 | 0.796 | 0.785 | 0.734 | 0.761 | 0.701 | 0.757 | 0.758 | 0.704 | 0.778 | 0.774 |
48
+