ruanchaves commited on
Commit
7c2909e
1 Parent(s): c640f9b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +64 -0
README.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ inference: false
3
+ language: pt
4
+ datasets:
5
+ - ruanchaves/hatebr
6
+ ---
7
+
8
+
9
+ # BERTimbau base for Offensive Language Detection
10
+
11
+ This is the [neuralmind/bert-base-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased) model finetuned for
12
+ Offensive Language Detection with the [HateBR](https://huggingface.co/ruanchaves/hatebr) dataset.
13
+ This model is suitable for Portuguese.
14
+
15
+ - Git Repo: [Evaluation of Portuguese Language Models](https://github.com/ruanchaves/eplm).
16
+ - Demo: [Hugging Face Space: Offensive Language Detection](https://ruanchaves-portuguese-offensive-language-de-d4d0507.hf.space)
17
+
18
+ ### **Labels**:
19
+ * 0 : The text is not offensive.
20
+ * 1 : The text is offensive.
21
+
22
+
23
+ ## Full classification example
24
+
25
+ ```python
26
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
27
+ import numpy as np
28
+ import torch
29
+ from scipy.special import softmax
30
+
31
+ model_name = "ruanchaves/bert-base-portuguese-cased-hatebr"
32
+ s1 = "Quem não deve não teme!!"
33
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
34
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
35
+ config = AutoConfig.from_pretrained(model_name)
36
+ model_input = tokenizer(*([s1],), padding=True, return_tensors="pt")
37
+ with torch.no_grad():
38
+ output = model(**model_input)
39
+ scores = output[0][0].detach().numpy()
40
+ scores = softmax(scores)
41
+ ranking = np.argsort(scores)
42
+ ranking = ranking[::-1]
43
+ for i in range(scores.shape[0]):
44
+ l = config.id2label[ranking[i]]
45
+ s = scores[ranking[i]]
46
+ print(f"{i+1}) Label: {l} Score: {np.round(float(s), 4)}")
47
+ ```
48
+
49
+ ## Citation
50
+
51
+ Our research is ongoing, and we are currently working on describing our experiments in a paper, which will be published soon.
52
+ In the meanwhile, if you would like to cite our work or models before the publication of the paper, please cite our [GitHub repository](https://github.com/ruanchaves/eplm):
53
+
54
+ ```
55
+ @software{Chaves_Rodrigues_eplm_2023,
56
+ author = {Chaves Rodrigues, Ruan and Tanti, Marc and Agerri, Rodrigo},
57
+ doi = {10.5281/zenodo.7781848},
58
+ month = {3},
59
+ title = ,
60
+ url = {https://github.com/ruanchaves/eplm},
61
+ version = {1.0.0},
62
+ year = {2023}
63
+ }
64
+ ```