ptaszynski commited on
Commit
7a87960
1 Parent(s): e063cac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md CHANGED
@@ -1,3 +1,58 @@
1
  ---
 
 
2
  license: cc-by-sa-4.0
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: pl
3
+
4
  license: cc-by-sa-4.0
5
+
6
+ datasets:
7
+
8
+ - Polish subset of Open Subtitles
9
+ - Polish subset of ParaCrawl
10
+ - Polish Parliamentary Corpus
11
+ - Polish Wikipedia - Feb 2020
12
+ - Expert-annotated Dataset for Automatic Cyberbullying Detection in Polish Laguage
13
+
14
  ---
15
+
16
+ # Polbert-CB - Polish BERT trained for Automatic Cyberbullying Detection
17
+ This is a Polish version of BERT language model, specifically, [Polbert](https://huggingface.co/dkleczek/bert-base-polish-uncased-v1), trained on a re-annotated and improved Dataset for Automatic Cyberbullying Detection in Polish Laguage.
18
+
19
+
20
+ ## Fine-tuning dataset
21
+ The dataset used for fine-tuning this model was based on the original [Dataset for Automatic Cyberbullying Detection in Polish Laguage](https://huggingface.co/datasets/poleval2019_cyberbullying), which was recently additionally cleaned and re-annotated by experts from [Samurai Labs](https://www.samurailabs.ai/). The improved dataset and will be released separately later.
22
+
23
+
24
+ ## Acknowledgements
25
+ * We would like to express our gratitude to the annotators of this dataset, including original annotators, and more recent expert annotators, for their invaluable time they spent on preparing the dataset.
26
+
27
+ ## Author
28
+ Michal Ptaszynski - contact me on:
29
+ - Twitter: [@mich_ptaszynski](https://twitter.com/mich_ptaszynski)
30
+ - GitHub: [ptaszynski](https://github.com/ptaszynski)
31
+ - LinkedIn: [michalptaszynsk](https://jp.linkedin.com/in/michalptaszynski)
32
+ - HuggingFace: [ptaszynski](https://huggingface.co/ptaszynski)
33
+
34
+
35
+ ## Licences
36
+ The finetuned model with all attached files is licensed under [CC BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/), or Creative Commons Attribution-ShareAlike 4.0 International License.
37
+
38
+ <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a>
39
+
40
+
41
+
42
+ ## Citations
43
+ Please, cite this model using the following citation.
44
+
45
+ ```
46
+ @article{ptaszynski2022cyberbullyibng-bert-pl,
47
+ title={Polish BERT trained for Automatic Cyberbullying Detection},
48
+ author={Ptaszynski, Michal and Pieciukiewicz, Agata and Dybala, Pawel and Skrzek, Pawel and Soliwoda, Kamil and Fortuna, Marcin and Leliwa, Gniewosz and Wroczynski, Michal},
49
+ year={2022},
50
+ publisher={HuggingFace},
51
+ url={https://github.com/ptaszynski/bert-base-polish-cyberbullying}"
52
+ }
53
+ ```
54
+
55
+ ## References
56
+ * https://github.com/google-research/bert
57
+ * https://github.com/ptaszynski/cyberbullying-Polish
58
+ * https://huggingface.co/datasets/poleval2019_cyberbullying