File size: 1,749 Bytes
e72dc7c 163db08 b3f9993 e72dc7c b3f9993 e72dc7c 163db08 5b91a3b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
---
language:
- de
- en
- multilingual
widget:
- text: "In December 1903 in France the Royal Swedish Academy of Sciences awarded Pierre Curie, Marie Curie, and Henri Becquerel the Nobel Prize in Physics."
- text: "Für Richard Phillips Feynman war es immer wichtig in New York, die unanschaulichen Gesetzmäßigkeiten der Quantenphysik Laien und Studenten nahezubringen und verständlich zu machen."
- text: "My name is Julian and I live in montreal"
- text: "My name is clara and I live in berkeley, california."
- text: "My name is wolfgang and I live in berlin"
tags:
- roberta
license: mit
datasets:
- wikiann
---
# Roberta for Multilingual Named Entity Recognition
## Model description
#### Limitations and bias
This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains.
## Training data
## Usage
```python
model_tuned = RobertaForTokenClassification.from_pretrained("./results/checkpoint-final/")
text = "Für Richard Phillips Feynman war es immer wichtig in New York, die unanschaulichen Gesetzmäßigkeiten der Quantenphysik Laien und Studenten nahezubringen und verständlich zu machen."
inputs = tokenizer(
text,
add_special_tokens=False, return_tensors="pt"
)
with torch.no_grad():
logits = model_tuned(**inputs).logits
predicted_token_class_ids = logits.argmax(-1)
# Note that tokens are classified rather then input words which means that
# there might be more predicted token classes than words.
# Multiple token classes might account for the same word
predicted_tokens_classes = [model_tuned.config.id2label[t.item()] for t in predicted_token_class_ids[0]]
predicted_tokens_classes
``` |