HeBERT: Pre-trained BERT for Polarity Analysis and Emotion Recognition
HeBERT is a Hebrew pretrained language model. It is based on Google's BERT architecture and it is BERT-Base config.
HeBert was trained on three dataset:
- A Hebrew version of OSCAR: ~9.8 GB of data, including 1 billion words and over 20.8 millions sentences.
- A Hebrew dump of Wikipedia: ~650 MB of data, including over 63 millions words and 3.8 millions sentences
- Emotion User Generated Content (UGC) data that was collected for the purpose of this study (described below).
Named-entity recognition (NER)
The ability of the model to classify named entities in text, such as persons' names, organizations, and locations; tested on a labeled dataset from Ben Mordecai and M Elhadad (2005), and evaluated with F1-score.
How to use
from transformers import pipeline
# how to use?
NER = pipeline(
"token-classification",
model="avichr/heBERT_NER",
tokenizer="avichr/heBERT_NER",
)
NER('דויד לומד באוניברסיטה העברית שבירושלים')
Other tasks
Emotion Recognition Model.
An online model can be found at huggingface spaces or as colab notebook
Sentiment Analysis.
masked-LM model (can be fine-tunned to any down-stream task).
Contact us
Avichay Chriqui
Inbal yahav
The Coller Semitic Languages AI Lab
Thank you, תודה, شكرا
If you used this model please cite us as :
Chriqui, A., & Yahav, I. (2021). HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition. arXiv preprint arXiv:2102.01909.
@article{chriqui2021hebert,
title={HeBERT \& HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition},
author={Chriqui, Avihay and Yahav, Inbal},
journal={arXiv preprint arXiv:2102.01909},
year={2021}
}
- Downloads last month
- 779