Update README.md
Browse files
README.md
CHANGED
@@ -5,7 +5,7 @@
|
|
5 |
---
|
6 |
|
7 |
# XLM-RoBERTa Token Classification for Named Entity Recognition (NER)
|
8 |
-
This model is a fine-tuned version of XLM-RoBERTa (xlm-roberta-base) for Named Entity Recognition (NER) tasks. It has been trained on the PAN-X subset of the XTREME dataset for
|
9 |
|
10 |
PER: Person names
|
11 |
ORG: Organization names
|
@@ -118,7 +118,37 @@ The model's performance is evaluated using the F1 score for NER. The predictions
|
|
118 |
[More Information Needed]
|
119 |
|
120 |
## Evaluation
|
121 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
122 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
123 |
|
124 |
### Testing Data, Factors & Metrics
|
|
|
5 |
---
|
6 |
|
7 |
# XLM-RoBERTa Token Classification for Named Entity Recognition (NER)
|
8 |
+
This model is a fine-tuned version of XLM-RoBERTa (xlm-roberta-base) for Named Entity Recognition (NER) tasks. It has been trained on the PAN-X subset of the XTREME dataset for German Language . The model identifies the following entity types:
|
9 |
|
10 |
PER: Person names
|
11 |
ORG: Organization names
|
|
|
118 |
[More Information Needed]
|
119 |
|
120 |
## Evaluation
|
121 |
+
('''import torch
|
122 |
+
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
|
123 |
+
import pandas as pd
|
124 |
+
|
125 |
+
# Load the fine-tuned XLM-RoBERTa model and tokenizer from Hugging Face
|
126 |
+
model_checkpoint = "MassMin/xlm-roberta-base-finetuned-panx-de" # Replace with your Hugging Face model name
|
127 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
128 |
+
|
129 |
+
# Load the tokenizer and model
|
130 |
+
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
|
131 |
+
model = AutoModelForTokenClassification.from_pretrained(model_checkpoint).to(device)
|
132 |
+
|
133 |
+
# Create the NER pipeline
|
134 |
+
ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer, framework="pt", device=0 if torch.cuda.is_available() else -1)
|
135 |
+
|
136 |
+
# Define the helper function to use the NER pipeline
|
137 |
+
def tag_text_with_pipeline(text, ner_pipeline):
|
138 |
+
# Use the NER pipeline to get predictions
|
139 |
+
results = ner_pipeline(text)
|
140 |
+
|
141 |
+
# Convert results to a DataFrame for easy viewing
|
142 |
+
df = pd.DataFrame(results)
|
143 |
+
df = df[['word', 'entity', 'score']]
|
144 |
+
df.columns = ['Tokens', 'Tags', 'Score'] # Rename columns for clarity
|
145 |
+
return df
|
146 |
+
|
147 |
+
# Example usage
|
148 |
+
text = "Jeff Dean works at Google in California."
|
149 |
+
result = tag_text_with_pipeline(text, ner_pipeline)
|
150 |
+
print(result)
|
151 |
+
''')
|
152 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
153 |
|
154 |
### Testing Data, Factors & Metrics
|