sloberta-sentinews-sentence
Slovenian 3-class sentiment classifier - SloBERTa fine-tuned on the sentence-level config of the SentiNews dataset.
The model is intended as:
(1) an out-of-the box sentence-level sentiment classifier or
(2) a sentence-level sentiment classification baseline.
Fine-tuning details
The model was fine-tuned on a random 90%/5%/5% train-val-test split of the sentence_level
configuration of the cjvt/sentinews dataset
using the following hyperparameters:
max_length = 79 # 99th percentile of encoded training sequences, sequences are padded/truncated to this length
batch_size = 128
optimizer = "adamw_torch"
learning_rate = 2e-5
num_epochs = 10
validation_metric = "macro_f1"
Feel free to inspect training_args.bin
for more details.
If you wish to directly compare your model to this one, you should use the same split as this model. To do so, use the following code:
import json
import datasets
# You can find split_indices.json in the 'Files and versions' tab
with open("split_indices.json", "r") as f_split:
split = json.load(f_split)
data = datasets.load_dataset("cjvt/sentinews", "sentence_level", split="train")
train_data = data.select(split["train_indices"])
dev_data = data.select(split["dev_indices"])
test_data = data.select(split["test_indices"])
Evaluation results
Best validation set results:
{
"eval_accuracy": 0.7207815275310835,
"eval_f1_macro": 0.6934678744913757,
"eval_f1_negative": 0.7042136003337507,
"eval_f1_neutral": 0.759215853398679,
"eval_f1_positive": 0.6169741697416974,
"eval_loss": 0.6337869167327881,
"eval_precision_negative": 0.6685148514851486,
"eval_precision_neutral": 0.7752393385552655,
"eval_precision_positive": 0.6314199395770392,
"eval_recall_negative": 0.74394006170119,
"eval_recall_neutral": 0.7438413361169103,
"eval_recall_positive": 0.6031746031746031
}
Test set results:
{
"test_loss": 0.6395984888076782,
"test_accuracy": 0.7158081705150977,
"test_precision_negative": 0.6570397111913358,
"test_recall_negative": 0.7292965271593945,
"test_f1_negative": 0.6912850812407682,
"test_precision_neutral": 0.7748017998714377,
"test_recall_neutral": 0.7418957734919983,
"test_f1_neutral": 0.7579918247563149,
"test_precision_positive": 0.6155642023346304,
"test_recall_positive": 0.5969811320754717,
"test_f1_positive": 0.6061302681992337,
"test_f1_macro": 0.6851357247321056,
}
- Downloads last month
- 21
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Dataset used to train cjvt/sloberta-sentinews-sentence
Evaluation results
- Test macro F1 on SentiNewsself-reported0.685
- Test accuracy on SentiNewsself-reported0.716
- Validation macro F1 on SentiNewsself-reported0.693
- Validation accuracy on SentiNewsself-reported0.721