Pieter Delobelle
commited on
Commit
•
cfe4fab
1
Parent(s):
1a278ae
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,71 @@
|
|
1 |
---
|
|
|
2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language: nl
|
3 |
license: mit
|
4 |
+
datasets:
|
5 |
+
- dbrd
|
6 |
+
model-index:
|
7 |
+
- name: robbert-v2-dutch-sentiment Copied
|
8 |
+
results:
|
9 |
+
- task:
|
10 |
+
type: text-classification
|
11 |
+
name: Text Classification
|
12 |
+
dataset:
|
13 |
+
name: dbrd
|
14 |
+
type: sentiment-analysis
|
15 |
+
split: test
|
16 |
+
metrics:
|
17 |
+
- name: Accuracy
|
18 |
+
type: accuracy
|
19 |
+
value: 0.93325
|
20 |
+
widget:
|
21 |
+
- text: "Ik erken dat dit een boek is, daarmee is alles gezegd."
|
22 |
+
- text: "Prachtig verhaal, heel mooi verteld en een verrassend einde... Een topper!"
|
23 |
+
thumbnail: "https://github.com/iPieter/RobBERT/raw/master/res/robbert_logo.png"
|
24 |
+
tags:
|
25 |
+
- Dutch
|
26 |
+
- Flemish
|
27 |
+
- RoBERTa
|
28 |
+
- RobBERT
|
29 |
---
|
30 |
+
|
31 |
+
<p align="center">
|
32 |
+
<img src="https://github.com/iPieter/RobBERT/raw/master/res/robbert_logo_with_name.png" alt="RobBERT: A Dutch RoBERTa-based Language Model" width="75%">
|
33 |
+
</p>
|
34 |
+
|
35 |
+
# RobBERT finetuned for sentiment analysis on DBRD
|
36 |
+
|
37 |
+
This is a finetuned model based on [RobBERT (v2)](https://huggingface.co/pdelobelle/robbert-v2-dutch-base). We used [DBRD](https://huggingface.co/datasets/dbrd), which consists of book reviews from [hebban.nl](hebban.nl). Hence our example sentences about books. We did some limited experiments to test if this also works for other domains, but this was not
|
38 |
+
|
39 |
+
# Training data and setup
|
40 |
+
We used the [Dutch Book Reviews Dataset (DBRD)](https://huggingface.co/datasets/dbrd) from van der Burgh et al. (2019).
|
41 |
+
Originally, these reviews got a five-star rating, but this has been converted to positive (⭐️⭐️⭐️⭐️ and ⭐️⭐️⭐️⭐️⭐️), neutral (⭐️⭐️⭐️) and negative (⭐️ and ⭐️⭐️).
|
42 |
+
We used 19.5k reviews for the training set, 528 reviews for the validation set and 2224 to calculate the final accuracy.
|
43 |
+
|
44 |
+
The validation set was used to evaluate a random hyperparameter search over the learning rate, weight decay and gradient accumulation steps.
|
45 |
+
The full training details are available in [`training_args.bin`](https://huggingface.co/DTAI-KULeuven/robbert-v2-dutch-sentiment/blob/main/training_args.bin) as a binary PyTorch file.
|
46 |
+
|
47 |
+
# Limitations and biases
|
48 |
+
- The domain of the reviews is limited to book reviews.
|
49 |
+
- Most authors of the book reviews were women, which could have caused [a difference in performance for reviews written by men and women](https://www.aclweb.org/anthology/2020.findings-emnlp.292).
|
50 |
+
|
51 |
+
## Credits and citation
|
52 |
+
|
53 |
+
This project is created by [Pieter Delobelle](https://people.cs.kuleuven.be/~pieter.delobelle), [Thomas Winters](https://thomaswinters.be) and [Bettina Berendt](https://people.cs.kuleuven.be/~bettina.berendt/).
|
54 |
+
If you would like to cite our paper or models, you can use the following BibTeX:
|
55 |
+
|
56 |
+
```
|
57 |
+
@inproceedings{delobelle2020robbert,
|
58 |
+
title = "{R}ob{BERT}: a {D}utch {R}o{BERT}a-based {L}anguage {M}odel",
|
59 |
+
author = "Delobelle, Pieter and
|
60 |
+
Winters, Thomas and
|
61 |
+
Berendt, Bettina",
|
62 |
+
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
|
63 |
+
month = nov,
|
64 |
+
year = "2020",
|
65 |
+
address = "Online",
|
66 |
+
publisher = "Association for Computational Linguistics",
|
67 |
+
url = "https://www.aclweb.org/anthology/2020.findings-emnlp.292",
|
68 |
+
doi = "10.18653/v1/2020.findings-emnlp.292",
|
69 |
+
pages = "3255--3265"
|
70 |
+
}
|
71 |
+
```
|