File size: 4,380 Bytes
02cc712
17fc0dc
 
02cc712
274a7a3
 
 
 
dfa3b40
17fc0dc
dfa3b40
 
17fc0dc
dfa3b40
 
 
 
17fc0dc
dfa3b40
 
17fc0dc
dfa3b40
 
17fc0dc
 
 
 
02cc712
274a7a3
 
 
 
 
 
 
 
 
 
c03a7db
274a7a3
c03a7db
 
 
 
 
274a7a3
c03a7db
6e1276e
274a7a3
6e1276e
274a7a3
6e1276e
274a7a3
6e1276e
274a7a3
6e1276e
c03a7db
6e1276e
c03a7db
6e1276e
274a7a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1566d52
 
 
 
 
 
 
 
274a7a3
 
 
 
 
 
dfa3b40
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
---
language:
- ur
license: mit
tags:
- generated_from_trainer
datasets:
- imdb_urdu_reviews
widget:
- text: میں نے یہ فلم دیکھنے کے لئے بہت احتیاط کی تھی، لیکن اس کی کہانی اور اداکاری
    نے میری توقعات کو پورا کیا۔ بالکل شاندار فلم!
  example_title: Positive Example 1
- text: اس فلم کی کہانی بہت بے معنی اور بے چارہ ہے۔ میں نے اپنا وقت اور پیسہ برباد
    کر دیا۔ براہ کرم اس سے بچیں!
  example_title: Negative Example 1
- text: یہ ناقابل فہم فلم ہے۔ کوئی بھی اسے دیکھ کر توڑ دل ہو جائے گا۔ بلکل بے فائدہ!
  example_title: Negative Example 2
- text: میں نے ہمیشہ کی طرح اس فلم کو بھی بہت مزہ دیا۔ اداکاری، کہانی، اور ڈائریکشن
    سب بہترین تھی۔ دل کھول کر تصویر دیکھنے کا موقع!
  example_title: Positive Example 2
- text: اس فلم میں اتنی بے وقوفی دکھائی گئی ہے کہ آپ بھی اپنے دماغ کو چیک کریں گے۔
    بلکل بکواس!
  example_title: Negative Example 3
base_model: urduhack/roberta-urdu-small
model-index:
- name: UrduClassification
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# UrduClassification

This model is a fine-tuned version of [urduhack/roberta-urdu-small](https://huggingface.co/urduhack/roberta-urdu-small) on the imdb_urdu_reviews dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4703

## Model Details

- Model Name: Urdu Sentiment Classification
- Model Architecture: RobertaForSequenceClassification
- Base Model: urduhack/roberta-urdu-small
- Dataset: IMDB Urdu Reviews
- Task: Sentiment Classification (Positive/Negative)

## Training Procedure
The model was fine-tuned using the transformers library and the Trainer class from Hugging Face. The training process involved the following steps:

1. Tokenization: The input Urdu text was tokenized using the RobertaTokenizerFast from the "urduhack/roberta-urdu-small" pre-trained model. The texts were padded and truncated to a maximum length of 256 tokens.

2. Model Architecture: The "urduhack/roberta-urdu-small" pre-trained model was loaded as the base model for sequence classification using the RobertaForSequenceClassification class.

3. Training Arguments: The training arguments were set, including the number of training epochs, batch size, learning rate, evaluation strategy, logging strategy, and more.

4. Training: The model was trained on the training dataset using the Trainer class. The training process was performed with gradient-based optimization techniques to minimize the cross-entropy loss between predicted and actual sentiment labels.

5. Evaluation: After each epoch, the model was evaluated on the validation dataset to monitor its performance. The evaluation results, including training loss and validation loss, were logged for analysis.

6. Fine-Tuning: The model parameters were fine-tuned during the training process to optimize its performance on the IMDb Urdu movie reviews sentiment analysis task.

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 0.4078        | 1.0   | 2500 | 0.3954          |
| 0.2633        | 2.0   | 5000 | 0.4007          |
| 0.1205        | 3.0   | 7500 | 0.4703          |

## Evaluation Results
The model was evaluated on an undisclosed dataset using a language modeling task. The evaluation results after 3 epochs of fine-tuning are as follows:

- Evaluation Loss: 0.3954
- Evaluation Runtime: 51.60 seconds
- Average Samples per Second: 96.89
- Average Steps per Second: 6.06
- Epoch: 3.0

### Framework versions

- Transformers 4.30.2
- Pytorch 2.0.0
- Datasets 2.1.0
- Tokenizers 0.13.3