--- language: - en - de - fr - zh - ne - multilingual widget: - text: "In December 1903 in France the Royal Swedish Academy of Sciences awarded Pierre Curie, Marie Curie, and Henri Becquerel the Nobel Prize in Physics." - text: "Für Richard Phillips Feynman war es immer wichtig in New York, die unanschaulichen Gesetzmäßigkeiten der Quantenphysik Laien und Studenten nahezubringen und verständlich zu machen." - text: "My name is Julian and I live in Constance" - text: "Terence David John Pratchett est né le 28 avril 1948 à Beaconsfield dans le Buckinghamshire, en Angleterre." - text: "北京市,通称北京(汉语拼音:Běijīng;邮政式拼音:Peking),简称“京”,是中华人民共和国的首都及直辖市,是该国的政治、文化、科技、教育、军事和国际交往中心,是一座全球城市,是世界人口第三多的城市和人口最多的首都,具有重要的国际影响力,同時也是目前世界唯一的“双奥之城”,即唯一既主办过夏季" - text: "काठमाडौँ नेपालको सङ्घीय राजधानी र नेपालको सबैभन्दा बढी जनसङ्ख्या भएको सहर हो।" tags: - roberta license: mit datasets: - wikiann --- # Roberta for Multilingual Named Entity Recognition ## Model description #### Limitations and bias This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains. ## Training data ## Metrics ## Usage ```python from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("julian-schelb/roberta-ner-multilingual/", add_prefix_space=True) model = AutoModelForTokenClassification.from_pretrained("julian-schelb/roberta-ner-multilingual/") text = "In December 1903 in France the Royal Swedish Academy of Sciences awarded Pierre Curie, Marie Curie, and Henri Becquerel the Nobel Prize in Physics." inputs = tokenizer( text, add_special_tokens=False, return_tensors="pt" ) with torch.no_grad(): logits = model(**inputs).logits predicted_token_class_ids = logits.argmax(-1) # Note that tokens are classified rather then input words which means that # there might be more predicted token classes than words. # Multiple token classes might account for the same word predicted_tokens_classes = [model_tuned.config.id2label[t.item()] for t in predicted_token_class_ids[0]] predicted_tokens_classes ```