SheldonSides commited on
Commit
0358c4e
1 Parent(s): 606aae2

Fine-tuning Complete

Browse files
Files changed (3) hide show
  1. README.md +16 -30
  2. pytorch_model.bin +1 -1
  3. training_args.bin +1 -1
README.md CHANGED
@@ -15,7 +15,7 @@ model-index:
15
  name: Text Classification
16
  type: text-classification
17
  dataset:
18
- name: dair-ai/emotion
19
  type: emotion
20
  config: split
21
  split: validation
@@ -23,10 +23,10 @@ model-index:
23
  metrics:
24
  - name: Accuracy
25
  type: accuracy
26
- value: 0.934
27
  - name: F1
28
  type: f1
29
- value: 0.9340654575276651
30
  ---
31
 
32
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -34,29 +34,15 @@ should probably proofread and complete it, then remove this comment. -->
34
 
35
  # distilbert-base-uncased-finetuned-emotion
36
 
37
- This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the [dair-ai/emotion](https://huggingface.co/datasets/dair-ai/emotion) dataset.
38
  It achieves the following results on the evaluation set:
39
- - Loss: 0.1526
40
- - Accuracy: 0.934
41
- - F1: 0.9341
42
 
43
  ## Model description
44
 
45
- #### Base Model Info.
46
- DistilBERT is a transformers model, smaller and faster than BERT, which was pretrained on the same corpus in a
47
- self-supervised fashion, using the BERT base model as a teacher. This means it was pretrained on the raw texts only,
48
- with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic
49
- process to generate inputs and labels from those texts using the BERT base model. More precisely, it was pretrained
50
- with three objectives:
51
-
52
- - Distillation loss: the model was trained to return the same probabilities as the BERT base model.
53
- - Masked language modeling (MLM): this is part of the original training loss of the BERT base model. When taking a
54
- sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the
55
- model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that
56
- usually see the words one after the other, or from autoregressive models like GPT which internally mask the future
57
- tokens. It allows the model to learn a bidirectional representation of the sentence.
58
- - Cosine embedding loss: the model was also trained to generate hidden states as close as possible as the BERT base
59
- model.
60
 
61
  ## Intended uses & limitations
62
 
@@ -72,8 +58,8 @@ More information needed
72
 
73
  The following hyperparameters were used during training:
74
  - learning_rate: 2e-05
75
- - train_batch_size: 124
76
- - eval_batch_size: 124
77
  - seed: 42
78
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
79
  - lr_scheduler_type: linear
@@ -83,16 +69,16 @@ The following hyperparameters were used during training:
83
 
84
  | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 |
85
  |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|
86
- | 1.0271 | 1.0 | 130 | 0.4635 | 0.863 | 0.8509 |
87
- | 0.3115 | 2.0 | 260 | 0.2129 | 0.926 | 0.9262 |
88
- | 0.1756 | 3.0 | 390 | 0.1709 | 0.9325 | 0.9327 |
89
- | 0.1345 | 4.0 | 520 | 0.1604 | 0.932 | 0.9319 |
90
- | 0.1183 | 5.0 | 650 | 0.1526 | 0.934 | 0.9341 |
91
 
92
 
93
  ### Framework versions
94
 
95
  - Transformers 4.34.1
96
  - Pytorch 2.0.1
97
- - Datasets 2.14.5
98
  - Tokenizers 0.14.1
 
15
  name: Text Classification
16
  type: text-classification
17
  dataset:
18
+ name: emotion
19
  type: emotion
20
  config: split
21
  split: validation
 
23
  metrics:
24
  - name: Accuracy
25
  type: accuracy
26
+ value: 0.9355
27
  - name: F1
28
  type: f1
29
+ value: 0.9354396627100748
30
  ---
31
 
32
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
34
 
35
  # distilbert-base-uncased-finetuned-emotion
36
 
37
+ This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the emotion dataset.
38
  It achieves the following results on the evaluation set:
39
+ - Loss: 0.1382
40
+ - Accuracy: 0.9355
41
+ - F1: 0.9354
42
 
43
  ## Model description
44
 
45
+ More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
  ## Intended uses & limitations
48
 
 
58
 
59
  The following hyperparameters were used during training:
60
  - learning_rate: 2e-05
61
+ - train_batch_size: 75
62
+ - eval_batch_size: 75
63
  - seed: 42
64
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
65
  - lr_scheduler_type: linear
 
69
 
70
  | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 |
71
  |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|
72
+ | 0.8649 | 1.0 | 214 | 0.3094 | 0.9055 | 0.9049 |
73
+ | 0.2229 | 2.0 | 428 | 0.1845 | 0.9305 | 0.9311 |
74
+ | 0.144 | 3.0 | 642 | 0.1556 | 0.941 | 0.9412 |
75
+ | 0.1111 | 4.0 | 856 | 0.1394 | 0.941 | 0.9409 |
76
+ | 0.095 | 5.0 | 1070 | 0.1382 | 0.9355 | 0.9354 |
77
 
78
 
79
  ### Framework versions
80
 
81
  - Transformers 4.34.1
82
  - Pytorch 2.0.1
83
+ - Datasets 2.14.6
84
  - Tokenizers 0.14.1
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:712ca86c29df2d9c5c41e3f7c9c0242abf5c55d5f1a5fa3614aabfeaf7773222
3
  size 267867821
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:47f248d6496e34f4874f7373d122dd49addc2752f3d0f27a79d5f195d85db1cc
3
  size 267867821
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5c71eea1153ffe119a35f2380beece1979faf3cb87452f06b164579c9f049a14
3
  size 4155
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fb0a95c0c93affa9ffb136bb0f4f1583d2b5d1b14a35ecd7db549d406bb0129a
3
  size 4155