--- language: pl license: cc-by-sa-4.0 datasets: - Polish subset of Open Subtitles - Polish subset of ParaCrawl - Polish Parliamentary Corpus - Polish Wikipedia - Feb 2020 - Expert-annotated Dataset for Automatic Cyberbullying Detection in Polish Laguage --- # Polbert-CB - Polish BERT trained for Automatic Cyberbullying Detection This is a Polish version of BERT language model, specifically, [Polbert](https://huggingface.co/dkleczek/bert-base-polish-uncased-v1), trained on a re-annotated and improved Dataset for Automatic Cyberbullying Detection in Polish Laguage. ## Fine-tuning dataset The dataset used for fine-tuning this model was based on the original [Dataset for Automatic Cyberbullying Detection in Polish Laguage](https://huggingface.co/datasets/poleval2019_cyberbullying), which was recently additionally cleaned and re-annotated by experts from [Samurai Labs](https://www.samurailabs.ai/). The improved dataset and will be released separately later. ## Acknowledgements * We would like to express our gratitude to the annotators of this dataset, including original annotators, and more recent expert annotators, for their invaluable time they spent on preparing the dataset. ## Author Michal Ptaszynski - contact me on: - Twitter: [@mich_ptaszynski](https://twitter.com/mich_ptaszynski) - GitHub: [ptaszynski](https://github.com/ptaszynski) - LinkedIn: [michalptaszynsk](https://jp.linkedin.com/in/michalptaszynski) - HuggingFace: [ptaszynski](https://huggingface.co/ptaszynski) ## Licences The finetuned model with all attached files is licensed under [CC BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/), or Creative Commons Attribution-ShareAlike 4.0 International License. Creative Commons License ## Citations Please, cite this model using the following citation. ``` @article{ptaszynski2022cyberbullyibng-bert-pl, title={Polish BERT trained for Automatic Cyberbullying Detection}, author={Ptaszynski, Michal and Pieciukiewicz, Agata and Dybala, Pawel and Skrzek, Pawel and Soliwoda, Kamil and Fortuna, Marcin and Leliwa, Gniewosz and Wroczynski, Michal}, year={2022}, publisher={HuggingFace}, url={https://github.com/ptaszynski/bert-base-polish-cyberbullying}" } ``` ## References * https://github.com/google-research/bert * https://github.com/ptaszynski/cyberbullying-Polish * https://huggingface.co/datasets/poleval2019_cyberbullying