ptaszynski
/

bert-base-polish-cyberbullying

Text Classification

Inference Endpoints

Model card Files Files and versions Community

ptaszynski commited on Sep 26, 2022

Commit

7a87960

·

1 Parent(s): e063cac

Update README.md

Files changed (1) hide show

README.md +55 -0

README.md CHANGED Viewed

@@ -1,3 +1,58 @@
 ---
 license: cc-by-sa-4.0
 ---

 ---
+language: pl
 license: cc-by-sa-4.0
+datasets:
+- Polish subset of Open Subtitles
+- Polish subset of ParaCrawl
+- Polish Parliamentary Corpus
+- Polish Wikipedia - Feb 2020
+- Expert-annotated Dataset for Automatic Cyberbullying Detection in Polish Laguage
 ---
+# Polbert-CB - Polish BERT trained for Automatic Cyberbullying Detection
+This is a Polish version of BERT language model, specifically, [Polbert](https://huggingface.co/dkleczek/bert-base-polish-uncased-v1), trained on a re-annotated and improved Dataset for Automatic Cyberbullying Detection in Polish Laguage.
+## Fine-tuning dataset
+The dataset used for fine-tuning this model was based on the original [Dataset for Automatic Cyberbullying Detection in Polish Laguage](https://huggingface.co/datasets/poleval2019_cyberbullying), which was recently additionally cleaned and re-annotated by experts from [Samurai Labs](https://www.samurailabs.ai/). The improved dataset and will be released separately later.
+## Acknowledgements
+* We would like to express our gratitude to the annotators of this dataset, including original annotators, and more recent expert annotators, for their invaluable time they spent on preparing the dataset.
+## Author
+Michal Ptaszynski - contact me on:
+- Twitter: [@mich_ptaszynski](https://twitter.com/mich_ptaszynski)
+- GitHub: [ptaszynski](https://github.com/ptaszynski)
+- LinkedIn: [michalptaszynsk](https://jp.linkedin.com/in/michalptaszynski)
+- HuggingFace: [ptaszynski](https://huggingface.co/ptaszynski)
+## Licences
+The finetuned model with all attached files is licensed under [CC BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/), or Creative Commons Attribution-ShareAlike 4.0 International License.
+<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a>
+## Citations
+Please, cite this model using the following citation.
+```
+@article{ptaszynski2022cyberbullyibng-bert-pl,
+  title={Polish BERT trained for Automatic Cyberbullying Detection},
+  author={Ptaszynski, Michal and Pieciukiewicz, Agata and Dybala, Pawel and Skrzek, Pawel and Soliwoda, Kamil and Fortuna, Marcin and Leliwa, Gniewosz and Wroczynski, Michal},
+  year={2022},
+  publisher={HuggingFace},
+  url={https://github.com/ptaszynski/bert-base-polish-cyberbullying}"
+}
+```
+## References
+* https://github.com/google-research/bert
+* https://github.com/ptaszynski/cyberbullying-Polish
+* https://huggingface.co/datasets/poleval2019_cyberbullying