--- license: cc-by-sa-4.0 datasets: - sinhala-nlp/NSINA-Headlines - sinhala-nlp/NSINA language: - si --- # Sinhala Headline Generation This is a text generation task created with the [NSINA dataset](https://github.com/Sinhala-NLP/NSINA). This dataset is also released with the same license as NSINA. The objective of the task is to generate news headlines based on the provided news content. ## Data We used the same instances from NSINA 1.0 as all the news articles had headlines. We divided this dataset into a training and test set following a 0.8 split. Data can be loaded into pandas dataframes using the following code. ```python from datasets import Dataset from datasets import load_dataset train = Dataset.to_pandas(load_dataset('sinhala-nlp/NSINA-Headlines', split='train')) test = Dataset.to_pandas(load_dataset('sinhala-nlp/NSINA-Headlines', split='test')) ``` ## Citation If you are using the dataset or the models, please cite the following paper. ~~~ @inproceedings{Nsina2024, author={Hettiarachchi, Hansi and Premasiri, Damith and Uyangodage, Lasitha and Ranasinghe, Tharindu}, title={{NSINA: A News Corpus for Sinhala}}, booktitle={The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)}, year={2024}, month={May}, } ~~~