--- language: - de - en - multilingual widget: - text: "In December 1903 in France the Royal Swedish Academy of Sciences awarded Pierre Curie, Marie Curie, and Henri Becquerel the Nobel Prize in Physics." - text: "Für Richard Phillips Feynman war es immer wichtig in New York, die unanschaulichen Gesetzmäßigkeiten der Quantenphysik Laien und Studenten nahezubringen und verständlich zu machen." - text: "My name is Julian and I live in montreal" - text: "My name is clara and I live in berkeley, california." - text: "My name is wolfgang and I live in berlin" tags: - roberta license: mit datasets: - wikiann --- # Roberta for Multilingual Named Entity Recognition ## Model description #### Limitations and bias This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains. ## Training data ## Usage ```python model_tuned = RobertaForTokenClassification.from_pretrained("./results/checkpoint-final/") text = "Für Richard Phillips Feynman war es immer wichtig in New York, die unanschaulichen Gesetzmäßigkeiten der Quantenphysik Laien und Studenten nahezubringen und verständlich zu machen." inputs = tokenizer( text, add_special_tokens=False, return_tensors="pt" ) with torch.no_grad(): logits = model_tuned(**inputs).logits predicted_token_class_ids = logits.argmax(-1) # Note that tokens are classified rather then input words which means that # there might be more predicted token classes than words. # Multiple token classes might account for the same word predicted_tokens_classes = [model_tuned.config.id2label[t.item()] for t in predicted_token_class_ids[0]] predicted_tokens_classes ```