emanuelaboros
commited on
Commit
•
19638c4
1
Parent(s):
964213b
Update README.md
Browse files
README.md
CHANGED
@@ -142,35 +142,7 @@ This model was finetuned on the [HIPE-2022 dataset](https://github.com/hipe-eval
|
|
142 |
|
143 |
## Usage
|
144 |
|
145 |
-
Here is an example of generation for Wikipedia page disambiguation:
|
146 |
-
|
147 |
-
```python
|
148 |
-
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
149 |
-
|
150 |
-
tokenizer = AutoTokenizer.from_pretrained("impresso-project/nel-hipe-multilingual")
|
151 |
-
model = AutoModelForSeq2SeqLM.from_pretrained("impresso-project/nel-hipe-multilingual").eval()
|
152 |
-
|
153 |
-
sentences = ["[START] United Press [END] - On the home front, the British populace remains steadfast in the face of ongoing air raids.",
|
154 |
-
"In [START] London [END], trotz der Zerstörung, ist der Geist der Menschen ungebrochen, mit Freiwilligen und zivilen Verteidigungseinheiten, die unermüdlich arbeiten, um die Kriegsanstrengungen zu unterstützen.",
|
155 |
-
"Les rapports des correspondants de la [START] AFP [END] mettent en lumière la poussée nationale pour augmenter la production dans les usines, essentielle pour fournir au front les matériaux nécessaires à la victoire."]
|
156 |
-
|
157 |
-
for sentence in sentences:
|
158 |
-
outputs = model.generate(
|
159 |
-
**tokenizer([sentence], return_tensors="pt"),
|
160 |
-
num_beams=5,
|
161 |
-
num_return_sequences=5
|
162 |
-
)
|
163 |
-
|
164 |
-
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
|
165 |
-
```
|
166 |
-
which outputs the following top-5 predictions (using constrained beam search)
|
167 |
-
```
|
168 |
-
['United Press International >> en ', 'The United Press International >> en ', 'United Press International >> de ', 'United Press >> en ', 'Associated Press >> en ']
|
169 |
-
['London >> de ', 'London >> de ', 'London >> de ', 'Stadt London >> de ', 'Londonderry >> de ']
|
170 |
-
['Agence France-Presse >> fr ', 'Agence France-Presse >> fr ', 'Agence France-Presse de la Presse écrite >> fr ', 'Agence France-Presse de la porte de Vincennes >> fr ', 'Agence France-Presse de la porte océanique >> fr ']
|
171 |
-
```
|
172 |
-
|
173 |
-
Example with simulated OCR noise:
|
174 |
```python
|
175 |
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
176 |
|
|
|
142 |
|
143 |
## Usage
|
144 |
|
145 |
+
Here is an example of generation for Wikipedia page disambiguation with simulated OCR noise:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
146 |
```python
|
147 |
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
148 |
|