Multi-lingual sentiment prediction trained from COVID19-related tweets
Repository: https://github.com/clampert/multilingual-sentiment-analysis/
Model trained on a large-scale (18437530 examples) dataset of multi-lingual tweets that was collected between March 2020 and November 2021 using Twitter’s Streaming API with varying COVID19-related keywords. Labels were auto-general based on the presence of positive and negative emoticons. For details on the dataset, see our IEEE BigData 2021 publication.
Base model is sentence-transformers/stsb-xlm-r-multilingual.
It was finetuned for sequence classification with positive
and negative
labels for two epochs (48 hours on 8xP100 GPUs).
Citation
If you use our model your work, please cite:
@inproceedings{lampert2021overcoming,
title={Overcoming Rare-Language Discrimination in Multi-Lingual Sentiment Analysis},
author={Jasmin Lampert and Christoph H. Lampert},
booktitle={IEEE International Conference on Big Data (BigData)},
year={2021},
note={Special Session: Machine Learning on Big Data},
}
Enjoy!
- Downloads last month
- 90
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.