Text Classification
Transformers
Safetensors
Sinhala
roberta
Inference Endpoints

Sinhala News Media Identification

This is a text classification task created with the NSINA dataset. This dataset is also released with the same license as NSINA.

Data

Data can be loaded into pandas dataframes using the following code.

from datasets import Dataset
from datasets import load_dataset

train = Dataset.to_pandas(load_dataset('sinhala-nlp/NSINA-Media', split='train'))
test = Dataset.to_pandas(load_dataset('sinhala-nlp/NSINA-Media', split='test'))

Citation

If you are using the dataset or the models, please cite the following paper.

@inproceedings{Nsina2024,
author={Hettiarachchi, Hansi and Premasiri, Damith and Uyangodage, Lasitha and Ranasinghe, Tharindu},
title={{NSINA: A News Corpus for Sinhala}},
booktitle={The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
year={2024},
month={May},
}
Downloads last month
3
Safetensors
Model size
127M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Datasets used to train sinhala-nlp/NSINA-Media-sinbert-large