czuk commited on
Commit
12f5d17
1 Parent(s): 212d0dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -1
README.md CHANGED
@@ -64,4 +64,32 @@ print(ids)
64
 
65
  # Citation
66
 
67
- *Will appear soon*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
 
65
  # Citation
66
 
67
+ ```latex
68
+ @inproceedings{piskorski-etal-2024-cross-lingual,
69
+ title = "Cross-lingual Named Entity Corpus for {S}lavic Languages",
70
+ author = "Piskorski, Jakub and
71
+ Marci{\'n}czuk, Micha{\l} and
72
+ Yangarber, Roman",
73
+ editor = "Calzolari, Nicoletta and
74
+ Kan, Min-Yen and
75
+ Hoste, Veronique and
76
+ Lenci, Alessandro and
77
+ Sakti, Sakriani and
78
+ Xue, Nianwen",
79
+ booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
80
+ month = may,
81
+ year = "2024",
82
+ address = "Torino, Italy",
83
+ publisher = "ELRA and ICCL",
84
+ url = "https://aclanthology.org/2024.lrec-main.369",
85
+ pages = "4143--4157",
86
+ abstract = "This paper presents a corpus manually annotated with named entities for six Slavic languages {---} Bulgarian, Czech, Polish, Slovenian, Russian,
87
+ and Ukrainian. This work is the result of a series of shared tasks, conducted in 2017{--}2023 as a part of the Workshops on Slavic Natural
88
+ Language Processing. The corpus consists of 5,017 documents on seven topics. The documents are annotated with five classes of named entities.
89
+ Each entity is described by a category, a lemma, and a unique cross-lingual identifier. We provide two train-tune dataset splits
90
+ {---} single topic out and cross topics. For each split, we set benchmarks using a transformer-based neural network architecture
91
+ with the pre-trained multilingual models {---} XLM-RoBERTa-large for named entity mention recognition and categorization,
92
+ and mT5-large for named entity lemmatization and linking.",
93
+ }
94
+ ```
95
+