Model Card: norbert3-large-ner (Fine-Tuned with WikiANN & norne)
Overview
- Model Name: Kushtrim/norbert3-large-ner
- Model Type: Named Entity Recognition (NER)
- Language: Multilingual with focus on Norwegian (Norsk)
- Fine-Tuned with: WikiANN & norne datasets
Description
The Kushtrim/norbert3-large-ner
is a pre-trained BERT (Bidirectional Encoder Representations from Transformers) model that has been fine-tuned on ltg/norbert3-large[^1] for Named Entity Recognition (NER) in the Norwegian language (Norsk). This model has been fine-tuned using the WikiANN & norne datasets, which includes annotated named entities from various languages, including Norwegian.
Named Entity Recognition is the task of identifying and classifying named entities in text, such as persons, organizations, locations, dates, and more. This model can be used to extract valuable information from Norwegian text with a focus on NER.
Intended Use
The Kushtrim/norbert3-large-ner
model, fine-tuned with the WikiANN & norne datasets, is designed for Named Entity Recognition (NER) applications in Norwegian text. It is particularly well-suited for identifying and classifying various types of named entities within Norwegian language content, including the following categories:
- Persons (PER): Recognizing individuals' names, both at the beginning and within their names.
- Organizations (ORG): Identifying organization names, distinguishing between the beginning and inside of these names.
- Locations (LOC): Recognizing location names, including both the beginning and interior of these names.
- Miscellaneous (MISC): Handling miscellaneous entities or categories within text.
Labels
Label | Description |
---|---|
Person (PER) | Real or fictional characters and animals |
Organization (ORG) | Any collection of people, such as firms, institutions, organizations, music groups, sports teams, unions, political parties etc. |
Location (LOC) | Geographical places, buildings and facilities |
Geo-political entity (GPE) | Geographical regions defined by political and/or social groups. A GPE entity subsumes and does not distinguish between a nation, its region, its government, or its people. |
Product (PROD) | Artificially produced entities are regarded products. This may include more abstract entities, such as speeches, radio shows, programming languages, contracts, laws and ideas. |
Event (EVT) | Festivals, cultural events, sports events, weather phenomena, wars, etc. Events are bounded in time and space. |
Derived (DRV) | Words (and phrases?) that are dervied from a name, but not a name in themselves. They typically contain a full name and are capitalized, but are not proper nouns. Examples (fictive) are "Brann-treneren" ("the Brann coach") or "Oslo-mannen" ("the man from Oslo"). |
Miscellaneous (MISC) | Names that do not belong in the other categories. Examples are animals species and names of medical conditions. Entities that are manufactured or produced are of type Products, whereas thing naturally or spontaneously occurring are of type Miscellaneous. |
Source of label information: norne
Usage
from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer
import pandas as pd
tokenizer = AutoTokenizer.from_pretrained("Kushtrim/norbert3-large-ner", trust_remote_code=True)
model = AutoModelForTokenClassification.from_pretrained("Kushtrim/norbert3-large-ner", trust_remote_code=True)
ner = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy='first')
text = "Sett inn tekst her"
results = ner(text)
pd.DataFrame.from_records(results)
[^1]: Samuel, D., Kutuzov, A., Touileb, S., Velldal, E., Øvrelid, L., Rønningstad, E., Sigdel, E., & Palatkina, A. (2023). NorBench -- A Benchmark for Norwegian Language Models. In Editor(s) of the Conference (Ed.), Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), 618-633. University of Tartu Library. URL
- Downloads last month
- 3
Model tree for Kushtrim/norbert3-large-ner
Base model
ltg/norbert3-large