File size: 3,027 Bytes
6f2ecbb
67b051a
 
3be114f
6f2ecbb
67b051a
 
 
 
 
6f2ecbb
c5c1a46
6f2ecbb
 
 
 
 
 
 
 
 
3be114f
 
 
 
 
 
 
 
 
3446bf2
bed1ba9
3446bf2
3be114f
 
 
 
 
 
 
 
 
6f2ecbb
 
 
e3c0367
6f2ecbb
 
 
296ad1e
 
0317007
 
 
 
296ad1e
6f2ecbb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43dde63
 
 
 
 
 
6f2ecbb
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
language:
- es
library_name: pysentimiento
tags:
- twitter
- named-entity-recognition
- ner
datasets:
- lince
---

# Named Entity Recognition model for Spanish/English
## robertuito-ner

Repository: [https://github.com/pysentimiento/pysentimiento/](https://github.com/finiteautomata/pysentimiento/)


Model trained with the Spanish/English split of the [LinCE NER corpus](https://ritual.uh.edu/lince/), a code-switched benchmark . Base model is [RoBERTuito](https://github.com/pysentimiento/robertuito), a RoBERTa model trained in Spanish tweets.


## Usage

If you want to use this model, we suggest you use it directly from the `pysentimiento` library as it is not working properly with the pipeline due to tokenization issues

```python
from pysentimiento import create_analyzer

ner_analyzer = create_analyzer("ner", lang="es")

ner_analyzer.predict(
  "rindanse ante el mejor, leonel andres messi cuccitini. serresiete no existis, segui en al-nassr"
)
 

# [{'type': 'PER',
#   'text': 'leonel andres messi cuccitini',
#   'start': 24,
#   'end': 53},
#  {'type': 'PER', 'text': 'serresiete', 'start': 55, 'end': 65},
#  {'type': 'LOC', 'text': 'al-nassr', 'start': 108, 'end': 116}]
```

## Results

Results are taken from the LinCE leaderboard

| Model                  | Sentiment       | NER                | POS     |
|:-----------------------|:----------------|:-------------------|:--------|
| RoBERTuito             | **60.6**        | 68.5               | 97.2    |
| XLM Large              | --              | **69.5**           | **97.2**   |
| XLM Base               | --              | 64.9               | 97.0    |
| C2S mBERT              | 59.1            | 64.6               | 96.9    |
| mBERT                  | 56.4            | 64.0               | 97.1    |
| BERT                   | 58.4            | 61.1               | 96.9    |
| BETO                   | 56.5            | --                 | --      |



## Citation

If you use this model in your research, please cite pysentimiento, RoBERTuito and LinCE papers:

```
@misc{perez2021pysentimiento,
      title={pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks},
      author={Juan Manuel Pérez and Juan Carlos Giudici and Franco Luque},
      year={2021},
      eprint={2106.09462},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@inproceedings{perez2022robertuito,
  title={RoBERTuito: a pre-trained language model for social media text in Spanish},
  author={P{\'e}rez, Juan Manuel and Furman, Dami{\'a}n Ariel and Alemany, Laura Alonso and Luque, Franco M},
  booktitle={Proceedings of the Thirteenth Language Resources and Evaluation Conference},
  pages={7235--7243},
  year={2022}
}

@inproceedings{aguilar2020lince,
  title={LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation},
  author={Aguilar, Gustavo and Kar, Sudipta and Solorio, Thamar},
  booktitle={Proceedings of the 12th Language Resources and Evaluation Conference},
  pages={1803--1813},
  year={2020}
}
```