Edit model card

RoBERTa for Multilabel Language Classification

Training

RoBERTa fine-tuned on small parts of Open Subtitles, Oscar and Tatoeba datasets (~9k samples per language).

Implemented heuristic algorithm for multilingual training data creation - https://github.com/n1kstep/lang-classifier

data source language
open_subtitles ka, he, en, de
oscar be, kk, az, hu
tatoeba ru, uk

Validation

The metrics obtained from validation on the another part of dataset (~1k samples per language).

Training Loss Validation Loss F1-Score Roc Auc Accuracy Support
0.161500 0.110949 0.947844 0.953939 0.762063 26858
Downloads last month
15
Inference API
This model can be loaded on Inference API (serverless).

Datasets used to train nikitast/multilang-classifier-roberta