---
pipeline_tag: sentiment-analysis
language: multilingual
license: apache-2.0
tags:
- "sentiment-analysis"
- "multilingual"
---

# Multi-lingual sentiment prediction trained from COVID19-related tweets

Repository: [https://github.com/clampert/multilingual-sentiment-analysis/](https://github.com/clampert/multilingual-sentiment-analysis/)

Model trained on a large-scale (18437530 examples) dataset of 
multi-lingual tweets that was collected between March 2020 
and November 2021 using Twitter’s Streaming API with varying
COVID19-related keywords. Labels were auto-general based on 
the presence of positive and negative emoticons. For details
on the dataset, see our IEEE BigData 2021 publication. 

Base model is [sentence-transformers/stsb-xlm-r-multilingual](https://huggingface.co/sentence-transformers/stsb-xlm-r-multilingual).
It was finetuned for sequence classification with `positive` 
and `negative` labels for two epochs (48 hours on 8xP100 GPUs). 

## Citation

If you use our model your work, please cite:

```
@inproceedings{lampert2021overcoming,
  title={Overcoming Rare-Language Discrimination in Multi-Lingual Sentiment Analysis},
  author={Jasmin Lampert and Christoph H. Lampert},
  booktitle={IEEE International Conference on Big Data (BigData)},
  year={2021},
  note={Special Session: Machine Learning on Big Data},
}
```

Enjoy!