emanuelaboros
commited on
Commit
•
a76cf3f
1
Parent(s):
debc4ef
Update README.md
Browse files
README.md
CHANGED
@@ -123,33 +123,22 @@ tags:
|
|
123 |
# mGENRE
|
124 |
|
125 |
|
126 |
-
The mGENRE (multilingual Generative ENtity REtrieval) system as presented in [Multilingual Autoregressive Entity Linking](https://arxiv.org/abs/2103.12528)
|
|
|
127 |
|
128 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
129 |
|
130 |
-
This model was trained on 105 languages from Wikipedia.
|
131 |
|
132 |
## BibTeX entry and citation info
|
133 |
|
134 |
-
**Please consider citing our works if you use code from this repository.**
|
135 |
-
|
136 |
-
```bibtex
|
137 |
-
@article{decao2020multilingual,
|
138 |
-
author = {De Cao, Nicola and Wu, Ledell and Popat, Kashyap and Artetxe, Mikel
|
139 |
-
and Goyal, Naman and Plekhanov, Mikhail and Zettlemoyer, Luke
|
140 |
-
and Cancedda, Nicola and Riedel, Sebastian and Petroni, Fabio},
|
141 |
-
title = "{Multilingual Autoregressive Entity Linking}",
|
142 |
-
journal = {Transactions of the Association for Computational Linguistics},
|
143 |
-
volume = {10},
|
144 |
-
pages = {274-290},
|
145 |
-
year = {2022},
|
146 |
-
month = {03},
|
147 |
-
issn = {2307-387X},
|
148 |
-
doi = {10.1162/tacl_a_00460},
|
149 |
-
url = {https://doi.org/10.1162/tacl\_a\_00460},
|
150 |
-
eprint = {https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl\_a\_00460/2004070/tacl\_a\_00460.pdf},
|
151 |
-
}
|
152 |
-
```
|
153 |
|
154 |
## Usage
|
155 |
|
@@ -161,23 +150,24 @@ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
|
161 |
tokenizer = AutoTokenizer.from_pretrained("impresso-project/nel-historic-multilingual")
|
162 |
model = AutoModelForSeq2SeqLM.from_pretrained("impresso-project/nel-historic-multilingual").eval()
|
163 |
|
164 |
-
sentences = ["[START] United Press [END] - On the home front, the British populace remains steadfast in the face of ongoing air raids.
|
165 |
-
|
166 |
-
|
167 |
-
|
168 |
-
|
169 |
-
|
170 |
-
)
|
171 |
-
|
172 |
-
|
|
|
|
|
|
|
173 |
```
|
174 |
which outputs the following top-5 predictions (using constrained beam search)
|
175 |
```
|
176 |
-
['
|
177 |
-
'
|
178 |
-
'
|
179 |
-
'Alberto Einstein >> it',
|
180 |
-
'Einstein >> it']
|
181 |
```
|
182 |
|
183 |
---
|
|
|
123 |
# mGENRE
|
124 |
|
125 |
|
126 |
+
The historical multilingual named entity linking (NEL) model is based on mGENRE (multilingual Generative ENtity REtrieval) system as presented in [Multilingual Autoregressive Entity Linking](https://arxiv.org/abs/2103.12528). mGENRE uses a sequence-to-sequence approach to entity retrieval (e.g., linking), based on finetuned [mBART](https://arxiv.org/abs/2001.08210) architecture.
|
127 |
+
GENRE performs retrieval generating the unique entity name conditioned on the input text using constrained beam search to only generate valid identifiers.
|
128 |
|
129 |
+
This model was finetuned on the following datasets.
|
130 |
+
|
131 |
+
| Dataset alias | README | Document type | Languages | Suitable for | Project | License |
|
132 |
+
|---------|---------|---------------|-----------| ---------------|---------------| ---------------|
|
133 |
+
| ajmc | [link](documentation/README-ajmc.md) | classical commentaries | de, fr, en | NERC-Coarse, NERC-Fine, EL | [AjMC](https://mromanello.github.io/ajax-multi-commentary/) | [![License: CC BY 4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/) |
|
134 |
+
| hipe2020 | [link](documentation/README-hipe2020.md)| historical newspapers | de, fr, en | NERC-Coarse, NERC-Fine, EL | [CLEF-HIPE-2020](https://impresso.github.io/CLEF-HIPE-2020)| [![License: CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC_BY--NC--SA_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-sa/4.0/)|
|
135 |
+
| topres19th | [link](documentation/README-topres19th.md) | historical newspapers | en | NERC-Coarse, EL |[Living with Machines](https://livingwithmachines.ac.uk/) | [![License: CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC_BY--NC--SA_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-sa/4.0/)|
|
136 |
+
| newseye | [link](documentation/README-newseye.md)| historical newspapers | de, fi, fr, sv | NERC-Coarse, NERC-Fine, EL | [NewsEye](https://www.newseye.eu/) | [![License: CC BY 4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)|
|
137 |
+
| sonar | [link](documentation/README-sonar.md) | historical newspapers | de | NERC-Coarse, EL | [SoNAR](https://sonar.fh-potsdam.de/) | [![License: CC BY 4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)|
|
138 |
|
|
|
139 |
|
140 |
## BibTeX entry and citation info
|
141 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
142 |
|
143 |
## Usage
|
144 |
|
|
|
150 |
tokenizer = AutoTokenizer.from_pretrained("impresso-project/nel-historic-multilingual")
|
151 |
model = AutoModelForSeq2SeqLM.from_pretrained("impresso-project/nel-historic-multilingual").eval()
|
152 |
|
153 |
+
sentences = ["[START] United Press [END] - On the home front, the British populace remains steadfast in the face of ongoing air raids.",
|
154 |
+
"In [START] London [END], trotz der Zerstörung, ist der Geist der Menschen ungebrochen, mit Freiwilligen und zivilen Verteidigungseinheiten, die unermüdlich arbeiten, um die Kriegsanstrengungen zu unterstützen.",
|
155 |
+
"Les rapports des correspondants de la [START] AFP [END] mettent en lumière la poussée nationale pour augmenter la production dans les usines, essentielle pour fournir au front les matériaux nécessaires à la victoire."]
|
156 |
+
|
157 |
+
for sentence in sentences:
|
158 |
+
outputs = model.generate(
|
159 |
+
**tokenizer([sentence], return_tensors="pt"),
|
160 |
+
num_beams=5,
|
161 |
+
num_return_sequences=5
|
162 |
+
)
|
163 |
+
|
164 |
+
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
|
165 |
```
|
166 |
which outputs the following top-5 predictions (using constrained beam search)
|
167 |
```
|
168 |
+
['United Press International >> en ', 'The United Press International >> en ', 'United Press International >> de ', 'United Press >> en ', 'Associated Press >> en ']
|
169 |
+
['London >> de ', 'London >> de ', 'London >> de ', 'Stadt London >> de ', 'Londonderry >> de ']
|
170 |
+
['Agence France-Presse >> fr ', 'Agence France-Presse >> fr ', 'Agence France-Presse de la Presse écrite >> fr ', 'Agence France-Presse de la porte de Vincennes >> fr ', 'Agence France-Presse de la porte océanique >> fr ']
|
|
|
|
|
171 |
```
|
172 |
|
173 |
---
|