Pieter Delobelle commited on
Commit
cfe4fab
1 Parent(s): 1a278ae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md CHANGED
@@ -1,3 +1,71 @@
1
  ---
 
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: nl
3
  license: mit
4
+ datasets:
5
+ - dbrd
6
+ model-index:
7
+ - name: robbert-v2-dutch-sentiment Copied
8
+ results:
9
+ - task:
10
+ type: text-classification
11
+ name: Text Classification
12
+ dataset:
13
+ name: dbrd
14
+ type: sentiment-analysis
15
+ split: test
16
+ metrics:
17
+ - name: Accuracy
18
+ type: accuracy
19
+ value: 0.93325
20
+ widget:
21
+ - text: "Ik erken dat dit een boek is, daarmee is alles gezegd."
22
+ - text: "Prachtig verhaal, heel mooi verteld en een verrassend einde... Een topper!"
23
+ thumbnail: "https://github.com/iPieter/RobBERT/raw/master/res/robbert_logo.png"
24
+ tags:
25
+ - Dutch
26
+ - Flemish
27
+ - RoBERTa
28
+ - RobBERT
29
  ---
30
+
31
+ <p align="center">
32
+ <img src="https://github.com/iPieter/RobBERT/raw/master/res/robbert_logo_with_name.png" alt="RobBERT: A Dutch RoBERTa-based Language Model" width="75%">
33
+ </p>
34
+
35
+ # RobBERT finetuned for sentiment analysis on DBRD
36
+
37
+ This is a finetuned model based on [RobBERT (v2)](https://huggingface.co/pdelobelle/robbert-v2-dutch-base). We used [DBRD](https://huggingface.co/datasets/dbrd), which consists of book reviews from [hebban.nl](hebban.nl). Hence our example sentences about books. We did some limited experiments to test if this also works for other domains, but this was not
38
+
39
+ # Training data and setup
40
+ We used the [Dutch Book Reviews Dataset (DBRD)](https://huggingface.co/datasets/dbrd) from van der Burgh et al. (2019).
41
+ Originally, these reviews got a five-star rating, but this has been converted to positive (⭐️⭐️⭐️⭐️ and ⭐️⭐️⭐️⭐️⭐️), neutral (⭐️⭐️⭐️) and negative (⭐️ and ⭐️⭐️).
42
+ We used 19.5k reviews for the training set, 528 reviews for the validation set and 2224 to calculate the final accuracy.
43
+
44
+ The validation set was used to evaluate a random hyperparameter search over the learning rate, weight decay and gradient accumulation steps.
45
+ The full training details are available in [`training_args.bin`](https://huggingface.co/DTAI-KULeuven/robbert-v2-dutch-sentiment/blob/main/training_args.bin) as a binary PyTorch file.
46
+
47
+ # Limitations and biases
48
+ - The domain of the reviews is limited to book reviews.
49
+ - Most authors of the book reviews were women, which could have caused [a difference in performance for reviews written by men and women](https://www.aclweb.org/anthology/2020.findings-emnlp.292).
50
+
51
+ ## Credits and citation
52
+
53
+ This project is created by [Pieter Delobelle](https://people.cs.kuleuven.be/~pieter.delobelle), [Thomas Winters](https://thomaswinters.be) and [Bettina Berendt](https://people.cs.kuleuven.be/~bettina.berendt/).
54
+ If you would like to cite our paper or models, you can use the following BibTeX:
55
+
56
+ ```
57
+ @inproceedings{delobelle2020robbert,
58
+ title = "{R}ob{BERT}: a {D}utch {R}o{BERT}a-based {L}anguage {M}odel",
59
+ author = "Delobelle, Pieter and
60
+ Winters, Thomas and
61
+ Berendt, Bettina",
62
+ booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
63
+ month = nov,
64
+ year = "2020",
65
+ address = "Online",
66
+ publisher = "Association for Computational Linguistics",
67
+ url = "https://www.aclweb.org/anthology/2020.findings-emnlp.292",
68
+ doi = "10.18653/v1/2020.findings-emnlp.292",
69
+ pages = "3255--3265"
70
+ }
71
+ ```