ptaszynski
commited on
Commit
•
7a87960
1
Parent(s):
e063cac
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,58 @@
|
|
1 |
---
|
|
|
|
|
2 |
license: cc-by-sa-4.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language: pl
|
3 |
+
|
4 |
license: cc-by-sa-4.0
|
5 |
+
|
6 |
+
datasets:
|
7 |
+
|
8 |
+
- Polish subset of Open Subtitles
|
9 |
+
- Polish subset of ParaCrawl
|
10 |
+
- Polish Parliamentary Corpus
|
11 |
+
- Polish Wikipedia - Feb 2020
|
12 |
+
- Expert-annotated Dataset for Automatic Cyberbullying Detection in Polish Laguage
|
13 |
+
|
14 |
---
|
15 |
+
|
16 |
+
# Polbert-CB - Polish BERT trained for Automatic Cyberbullying Detection
|
17 |
+
This is a Polish version of BERT language model, specifically, [Polbert](https://huggingface.co/dkleczek/bert-base-polish-uncased-v1), trained on a re-annotated and improved Dataset for Automatic Cyberbullying Detection in Polish Laguage.
|
18 |
+
|
19 |
+
|
20 |
+
## Fine-tuning dataset
|
21 |
+
The dataset used for fine-tuning this model was based on the original [Dataset for Automatic Cyberbullying Detection in Polish Laguage](https://huggingface.co/datasets/poleval2019_cyberbullying), which was recently additionally cleaned and re-annotated by experts from [Samurai Labs](https://www.samurailabs.ai/). The improved dataset and will be released separately later.
|
22 |
+
|
23 |
+
|
24 |
+
## Acknowledgements
|
25 |
+
* We would like to express our gratitude to the annotators of this dataset, including original annotators, and more recent expert annotators, for their invaluable time they spent on preparing the dataset.
|
26 |
+
|
27 |
+
## Author
|
28 |
+
Michal Ptaszynski - contact me on:
|
29 |
+
- Twitter: [@mich_ptaszynski](https://twitter.com/mich_ptaszynski)
|
30 |
+
- GitHub: [ptaszynski](https://github.com/ptaszynski)
|
31 |
+
- LinkedIn: [michalptaszynsk](https://jp.linkedin.com/in/michalptaszynski)
|
32 |
+
- HuggingFace: [ptaszynski](https://huggingface.co/ptaszynski)
|
33 |
+
|
34 |
+
|
35 |
+
## Licences
|
36 |
+
The finetuned model with all attached files is licensed under [CC BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/), or Creative Commons Attribution-ShareAlike 4.0 International License.
|
37 |
+
|
38 |
+
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a>
|
39 |
+
|
40 |
+
|
41 |
+
|
42 |
+
## Citations
|
43 |
+
Please, cite this model using the following citation.
|
44 |
+
|
45 |
+
```
|
46 |
+
@article{ptaszynski2022cyberbullyibng-bert-pl,
|
47 |
+
title={Polish BERT trained for Automatic Cyberbullying Detection},
|
48 |
+
author={Ptaszynski, Michal and Pieciukiewicz, Agata and Dybala, Pawel and Skrzek, Pawel and Soliwoda, Kamil and Fortuna, Marcin and Leliwa, Gniewosz and Wroczynski, Michal},
|
49 |
+
year={2022},
|
50 |
+
publisher={HuggingFace},
|
51 |
+
url={https://github.com/ptaszynski/bert-base-polish-cyberbullying}"
|
52 |
+
}
|
53 |
+
```
|
54 |
+
|
55 |
+
## References
|
56 |
+
* https://github.com/google-research/bert
|
57 |
+
* https://github.com/ptaszynski/cyberbullying-Polish
|
58 |
+
* https://huggingface.co/datasets/poleval2019_cyberbullying
|