ptaszynski
/

bert-base-polish-cyberbullying

Text Classification

Inference Endpoints

Model card Files Files and versions Community

bert-base-polish-cyberbullying / README.md

ptaszynski's picture

Update README.md

5663327 11 months ago

|

history blame contribute delete

3.45 kB

	---
	license: cc-by-4.0
	datasets:
	- ptaszynski/PolishCyberbullyingDataset
	language:
	- pl
	tags:
	- cyberbullying
	- hate-speech
	---

	# Polbert-CB - Polish BERT trained for Automatic Cyberbullying Detection
	This is a Polish version of BERT language model, specifically, [Polbert](https://huggingface.co/dkleczek/bert-base-polish-uncased-v1), trained on a re-annotated and improved Dataset for Automatic Cyberbullying Detection in Polish Laguage.


	## Fine-tuning dataset
	The dataset used for fine-tuning this model was based on the original [Dataset for Automatic Cyberbullying Detection in Polish Laguage](https://huggingface.co/datasets/poleval2019_cyberbullying), which was recently additionally cleaned and re-annotated by experts from [Samurai Labs](https://www.samurailabs.ai/). The improved dataset and will be released separately later.


	## Acknowledgements
	* We would like to express our gratitude to the annotators of this dataset, including original annotators, and more recent expert annotators, for their invaluable time they spent on preparing the dataset.

	## Author
	Michal Ptaszynski - contact me on:
	- Twitter: [@mich_ptaszynski](https://twitter.com/mich_ptaszynski)
	- GitHub: [ptaszynski](https://github.com/ptaszynski)
	- LinkedIn: [michalptaszynski](https://jp.linkedin.com/in/michalptaszynski)
	- HuggingFace: [ptaszynski](https://huggingface.co/ptaszynski)


	## Licences
	The finetuned model with all attached files is licensed under [CC BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/), or Creative Commons Attribution-ShareAlike 4.0 International License.

	<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a>



	## Citations
	Please, cite this model using the following citation.

	Model:
	```
	@article{ptaszynski2022cyberbullyibng-bert-pl,
	title={Polish BERT trained for Automatic Cyberbullying Detection},
	author={Ptaszynski, Michal and Pieciukiewicz, Agata and Dybala, Pawel and Skrzek, Pawel and Soliwoda, Kamil and Fortuna, Marcin and Leliwa, Gniewosz and Wroczynski, Michal},
	year={2022},
	publisher={HuggingFace},
	url={https://github.com/ptaszynski/bert-base-polish-cyberbullying}"
	}
	```

	Original dataset:
	```
	@article{ptaszynski2019results,
	title={Results of the poleval 2019 shared task 6: First dataset and open shared task for automatic cyberbullying detection in polish twitter},
	author={Ptaszynski, Michal and Pieciukiewicz, Agata and Dyba{\l}a, Pawe{\l}},
	year={2019},
	publisher={Warszawa: Institute of Computer Sciences. Polish Academy of Sciences}
	}
	```

	Improved dataset:

	The improved dataset used for training this model was released as follows.
	[Expert-annotated dataset to study cyberbullying in Polish language](https://huggingface.co/datasets/ptaszynski/PolishCyberbullyingDataset)

	```
	@article{ptaszynski2023expert,
	title={Expert-Annotated Dataset to Study Cyberbullying in Polish Language},
	author={Ptaszynski, Michal and Pieciukiewicz, Agata and Dybala, Pawel and Skrzek, Pawel and Soliwoda, Kamil and Fortuna, Marcin and Leliwa, Gniewosz and Wroczynski, Michal},
	journal={Data},
	volume={9},
	number={1},
	pages={1},
	year={2023},
	publisher={MDPI}
	}
	```

	## References
	* https://github.com/google-research/bert
	* https://github.com/ptaszynski/cyberbullying-Polish
	* https://huggingface.co/datasets/poleval2019_cyberbullying