julian-schelb's picture
Update README.md
f80a6d9
|
raw
history blame
1.75 kB
metadata
language:
  - de
  - en
  - multilingual
widget:
  - text: >-
      In December 1903 in France the Royal Swedish Academy of Sciences awarded
      Pierre Curie, Marie Curie, and Henri Becquerel the Nobel Prize in Physics.
  - text: >-
      Für Richard Phillips Feynman war es immer wichtig in New York, die
      unanschaulichen Gesetzmäßigkeiten der Quantenphysik Laien und Studenten
      nahezubringen und verständlich zu machen.
  - text: My name is Julian and I live in montreal
  - text: My name is clara and I live in berkeley, california.
  - text: My name is wolfgang and I live in berlin
tags:
  - roberta
license: mit
datasets:
  - wikiann

Roberta for Multilingual Named Entity Recognition

Model description

Limitations and bias

This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains.

Training data

Usage


model = RobertaForTokenClassification.from_pretrained("julian-schelb/roberta-ner-multilingual/")

text = "Für Richard Phillips Feynman war es immer wichtig in New York, die unanschaulichen Gesetzmäßigkeiten der Quantenphysik Laien und Studenten nahezubringen und verständlich zu machen."

inputs = tokenizer(
    text, 
    add_special_tokens=False, return_tensors="pt"
)

with torch.no_grad():
    logits = model(**inputs).logits

predicted_token_class_ids = logits.argmax(-1)

# Note that tokens are classified rather then input words which means that
# there might be more predicted token classes than words.
# Multiple token classes might account for the same word
predicted_tokens_classes = [model_tuned.config.id2label[t.item()] for t in predicted_token_class_ids[0]]
predicted_tokens_classes