File size: 3,182 Bytes
1b4ce9d
 
 
 
 
 
 
 
 
 
 
 
1c0bf05
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1b4ce9d
 
 
1c0bf05
1b4ce9d
 
 
 
 
 
 
 
 
 
 
1c0bf05
1b4ce9d
1c0bf05
1b4ce9d
 
 
 
 
 
1c0bf05
 
1b4ce9d
 
 
1c0bf05
1b4ce9d
 
 
 
 
 
 
1c0bf05
5706d8e
1c0bf05
5706d8e
1b4ce9d
1c0bf05
 
1b4ce9d
1c0bf05
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
license: mit
language:
- uz
metrics:
- accuracy
pipeline_tag: token-classification
tags:
- ner
- uzbek_ner
- ner_for_uzbek_language
---
## Named Entity Recognition (NER) Model for Uzbek Language

### About the Model
This model is designed for Named Entity Recognition (NER) in Uzbek text. The model can identify various categories of named entities, including persons, places, organizations, dates, and more. This model is based on the XLM-RoBERTa large architecture.

### Note!!!
The model is trained on the NEWS dataset and primarily has high accuracy for identifying NER in NEWS texts.

### Categories
The model can identify the following NER categories:
- **LOC (Location names)**
- **ORG (Organization names)**
- **PERSON (Person names)**
- **DATE (Date expressions)**
- **MONEY (Monetary amounts)**
- **PERCENT (Percentage values)**
- **QUANTITY (Quantities)**
- **TIME (Time expressions)**
- **PRODUCT (Product names)**
- **EVENT (Event names)**
- **WORK_OF_ART (Work of art titles)**
- **LANGUAGE (Language names)**
- **CARDINAL (Cardinal numbers)**
- **ORDINAL (Ordinal numbers)**
- **NORP (Nationalities or religious/political groups)**
- **FACILITY (Facility names)**
- **LAW (Laws or regulations)**
- **GPE (Countries, cities, states)**

### Examples
To demonstrate how the model works, here are a few examples:
```python
from transformers import pipeline, AutoTokenizer, AutoModelForTokenClassification

model_name_or_path = "risqaliyevds/xlm-roberta-large-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForTokenClassification.from_pretrained(model_name_or_path).to("cuda")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)

text = "Shavkat Mirziyoyev Rossiyada rasmiy safarda bo'ldi."
ner = nlp(text)

for entity in ner:
    print(entity)
```
Example text: "Shavkat Mirziyoyev Rossiyada rasmiy safarda bo'ldi."

Results:
```python
[{'entity': 'B-PERSON', 'score': 0.88995147, 'index': 1, 'word': '▁Shavkat', 'start': 0, 'end': 7},
 {'entity': 'I-PERSON', 'score': 0.980681, 'index': 2, 'word': '▁Mirziyoyev', 'start': 8, 'end': 18},
 {'entity': 'B-GPE', 'score': 0.8208886, 'index': 3, 'word': '▁Rossiya', 'start': 19, 'end': 26}]
```

### Loading and Using the Model
To download and use the model from the Hugging Face platform, you can use the following code:
```python
from transformers import pipeline, AutoTokenizer, AutoModelForTokenClassification

model_name_or_path = "risqaliyevds/xlm-roberta-large-ner"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForTokenClassification.from_pretrained(model_name_or_path).to("cuda")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
```

## Contact

If you have any questions or need more information, please contact us.
LinkedIn: [Riskaliev Murad](https://www.linkedin.com/in/risqaliyevds/)

### License
This model is provided as open source and is available for free use by all users.

### Conclusion
The NER model for the Uzbek language is effective in identifying various named entities in texts. The high accuracy and wide range of categories make it useful for academic research, document analysis, and many other fields.