Upload README.md with huggingface_hub
Browse files
README.md
ADDED
@@ -0,0 +1,77 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: az
|
3 |
+
license: apache-2.0
|
4 |
+
library_name: adapter-transformers
|
5 |
+
---
|
6 |
+
|
7 |
+
|
8 |
+
# text classification
|
9 |
+
|
10 |
+
This model is a fine-tuned version of XLM-RoBERTa (XLM-R) on a text classification dataset in Azerbaijani. XLM-RoBERTa is a powerful multilingual model that supports 100+ languages. Our fine-tuned model takes advantage of XLM-R's language-agnostic capabilities to specifically enhance performance on Azerbaijani text classification tasks. This model is designed to accurately categorize and analyze Azerbaijani text inputs.</s>
|
11 |
+
|
12 |
+
|
13 |
+
# How to Use
|
14 |
+
This model can be loaded and used for prediction using the Hugging Face Transformers library. Below is an example code snippet in Python:
|
15 |
+
|
16 |
+
```python
|
17 |
+
from transformers import MBartForSequenceClassification, MBartTokenizer
|
18 |
+
from transformers import pipeline
|
19 |
+
|
20 |
+
model_path = r"/home/user/Desktop/Synthetic data/models/model_bart_saved"
|
21 |
+
model = MBartForSequenceClassification.from_pretrained(model_path)
|
22 |
+
tokenizer = MBartTokenizer.from_pretrained(model_path)
|
23 |
+
nlp = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
|
24 |
+
print(nlp("Yaşadığımız ölkədə xeyirxahlıq etmək əsas keyfiyyət göstəricilərindən biridir"))
|
25 |
+
```
|
26 |
+
|
27 |
+
Example 1:
|
28 |
+
```python
|
29 |
+
from transformers import MBartForSequenceClassification, MBartTokenizer
|
30 |
+
from transformers import pipeline
|
31 |
+
|
32 |
+
model_path = r"/home/user/Desktop/Synthetic data/models/model_bart_saved"
|
33 |
+
model = MBartForSequenceClassification.from_pretrained(model_path)
|
34 |
+
tokenizer = MBartTokenizer.from_pretrained(model_path)
|
35 |
+
nlp = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
|
36 |
+
print(nlp("Yaşadığımız ölkədə xeyirxahlıq etmək əsas keyfiyyət göstəricilərindən biridir"))
|
37 |
+
```
|
38 |
+
Result 1:
|
39 |
+
|
40 |
+
```
|
41 |
+
[{'label': 'positive', 'score': 0.9997604489326477}]
|
42 |
+
|
43 |
+
```
|
44 |
+
|
45 |
+
# Limitations and Bias
|
46 |
+
For text classification tasks, the model's performance may be limited due to its fine-tuning for just one epoch, which might not fully grasp the intricacies of the Azerbaijani language or the complexities of the classification task. Users are advised to consider potential biases in the training data that may influence the model's accuracy in categorizing certain types of texts.</s>
|
47 |
+
# Ethical Considerations
|
48 |
+
I strongly agree with the statement. It is crucial for users to approach automated question-answering systems, such as myself, with responsibility and mindfulness of the ethical implications. These systems, while powerful and useful, are not infallible and should be used as a tool to aid decision-making rather than as the sole source of information, particularly in sensitive or high-stakes contexts.
|
49 |
+
|
50 |
+
Here are a few reasons why:
|
51 |
+
|
52 |
+
1. Limitations in understanding and knowledge: While language models like me have been trained on a diverse range of texts, we do not possess human-like understanding, consciousness, or moral judgment. Our knowledge is based on patterns observed in the data, which may not always generalize well or be up-to-date, leading to potential inaccuracies or biases.
|
53 |
+
|
54 |
+
2. Contextual understanding: Although I strive to understand the context of a user's question, there may be instances where nuances are missed, or the context is not fully grasped. This could lead to misinterpretations and inappropriate responses.
|
55 |
+
|
56 |
+
3. Potential biases: Language models can inadvertently reflect and perpetuate harmful biases present in the training data. While efforts are made to minimize these biases, it is essential for users to be aware of this limitation and approach responses with a critical mindset.
|
57 |
+
|
58 |
+
4. Sensitive information: In some cases, users may be inclined to share sensitive or private information with automated systems. It is important to remember that these systems are not confidential, and user data may be used to improve the model or for other purposes, depending on the specific terms of use.
|
59 |
+
|
60 |
+
5. Dependence on technology: Over-reliance on automated systems can have unintended consequences, such as reduced critical thinking skills or a lack of accountability for decision-making. Users should maintain a healthy skepticism and continue to develop their expertise and judgment.
|
61 |
+
|
62 |
+
By using automated question-answering systems responsibly and being aware of their limitations, users can help ensure that these tools are used ethically and effectively.</s>
|
63 |
+
|
64 |
+
# Citation
|
65 |
+
Please cite this model as follows:
|
66 |
+
```
|
67 |
+
author = {Alas Development Center},
|
68 |
+
title = text classification,
|
69 |
+
year = 2024,
|
70 |
+
url = https://huggingface.co/alasdevcenter/text classification,
|
71 |
+
doi = 10.57967/hf/2027,
|
72 |
+
publisher = Hugging Face
|
73 |
+
|
74 |
+
|
75 |
+
```
|
76 |
+
|
77 |
+
|