File size: 3,027 Bytes
6f2ecbb 67b051a 3be114f 6f2ecbb 67b051a 6f2ecbb c5c1a46 6f2ecbb 3be114f 3446bf2 bed1ba9 3446bf2 3be114f 6f2ecbb e3c0367 6f2ecbb 296ad1e 0317007 296ad1e 6f2ecbb 43dde63 6f2ecbb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
---
language:
- es
library_name: pysentimiento
tags:
- twitter
- named-entity-recognition
- ner
datasets:
- lince
---
# Named Entity Recognition model for Spanish/English
## robertuito-ner
Repository: [https://github.com/pysentimiento/pysentimiento/](https://github.com/finiteautomata/pysentimiento/)
Model trained with the Spanish/English split of the [LinCE NER corpus](https://ritual.uh.edu/lince/), a code-switched benchmark . Base model is [RoBERTuito](https://github.com/pysentimiento/robertuito), a RoBERTa model trained in Spanish tweets.
## Usage
If you want to use this model, we suggest you use it directly from the `pysentimiento` library as it is not working properly with the pipeline due to tokenization issues
```python
from pysentimiento import create_analyzer
ner_analyzer = create_analyzer("ner", lang="es")
ner_analyzer.predict(
"rindanse ante el mejor, leonel andres messi cuccitini. serresiete no existis, segui en al-nassr"
)
# [{'type': 'PER',
# 'text': 'leonel andres messi cuccitini',
# 'start': 24,
# 'end': 53},
# {'type': 'PER', 'text': 'serresiete', 'start': 55, 'end': 65},
# {'type': 'LOC', 'text': 'al-nassr', 'start': 108, 'end': 116}]
```
## Results
Results are taken from the LinCE leaderboard
| Model | Sentiment | NER | POS |
|:-----------------------|:----------------|:-------------------|:--------|
| RoBERTuito | **60.6** | 68.5 | 97.2 |
| XLM Large | -- | **69.5** | **97.2** |
| XLM Base | -- | 64.9 | 97.0 |
| C2S mBERT | 59.1 | 64.6 | 96.9 |
| mBERT | 56.4 | 64.0 | 97.1 |
| BERT | 58.4 | 61.1 | 96.9 |
| BETO | 56.5 | -- | -- |
## Citation
If you use this model in your research, please cite pysentimiento, RoBERTuito and LinCE papers:
```
@misc{perez2021pysentimiento,
title={pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks},
author={Juan Manuel Pérez and Juan Carlos Giudici and Franco Luque},
year={2021},
eprint={2106.09462},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@inproceedings{perez2022robertuito,
title={RoBERTuito: a pre-trained language model for social media text in Spanish},
author={P{\'e}rez, Juan Manuel and Furman, Dami{\'a}n Ariel and Alemany, Laura Alonso and Luque, Franco M},
booktitle={Proceedings of the Thirteenth Language Resources and Evaluation Conference},
pages={7235--7243},
year={2022}
}
@inproceedings{aguilar2020lince,
title={LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation},
author={Aguilar, Gustavo and Kar, Sudipta and Solorio, Thamar},
booktitle={Proceedings of the 12th Language Resources and Evaluation Conference},
pages={1803--1813},
year={2020}
}
``` |