SheldonSides
/

distilbert-base-uncased-finetuned-emotion

@@ -15,7 +15,7 @@ model-index:
       name: Text Classification
       type: text-classification
     dataset:
-      name: dair-ai/emotion
       type: emotion
       config: split
       split: validation
@@ -23,10 +23,10 @@ model-index:
     metrics:
     - name: Accuracy
       type: accuracy
-      value: 0.934
     - name: F1
       type: f1
-      value: 0.9340654575276651
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -34,29 +34,15 @@ should probably proofread and complete it, then remove this comment. -->
 # distilbert-base-uncased-finetuned-emotion
-This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the [dair-ai/emotion](https://huggingface.co/datasets/dair-ai/emotion) dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.1526
-- Accuracy: 0.934
-- F1: 0.9341
 ## Model description
-#### Base Model Info.
-DistilBERT is a transformers model, smaller and faster than BERT, which was pretrained on the same corpus in a
-self-supervised fashion, using the BERT base model as a teacher. This means it was pretrained on the raw texts only,
-with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic
-process to generate inputs and labels from those texts using the BERT base model. More precisely, it was pretrained
-with three objectives:
-- Distillation loss: the model was trained to return the same probabilities as the BERT base model.
-- Masked language modeling (MLM): this is part of the original training loss of the BERT base model. When taking a
-  sentence, the model randomly masks 15% of the words in the input then run the entire masked sentence through the
-  model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that
-  usually see the words one after the other, or from autoregressive models like GPT which internally mask the future
-  tokens. It allows the model to learn a bidirectional representation of the sentence.
-- Cosine embedding loss: the model was also trained to generate hidden states as close as possible as the BERT base
-  model.
 ## Intended uses & limitations
@@ -72,8 +58,8 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 2e-05
-- train_batch_size: 124
-- eval_batch_size: 124
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
@@ -83,16 +69,16 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1     |
 |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|
-| 1.0271        | 1.0   | 130  | 0.4635          | 0.863    | 0.8509 |
-| 0.3115        | 2.0   | 260  | 0.2129          | 0.926    | 0.9262 |
-| 0.1756        | 3.0   | 390  | 0.1709          | 0.9325   | 0.9327 |
-| 0.1345        | 4.0   | 520  | 0.1604          | 0.932    | 0.9319 |
-| 0.1183        | 5.0   | 650  | 0.1526          | 0.934    | 0.9341 |
 ### Framework versions
 - Transformers 4.34.1
 - Pytorch 2.0.1
-- Datasets 2.14.5
 - Tokenizers 0.14.1

       name: Text Classification
       type: text-classification
     dataset:
+      name: emotion
       type: emotion
       config: split
       split: validation
     metrics:
     - name: Accuracy
       type: accuracy
+      value: 0.9355
     - name: F1
       type: f1
+      value: 0.9354396627100748
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # distilbert-base-uncased-finetuned-emotion
+This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the emotion dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.1382
+- Accuracy: 0.9355
+- F1: 0.9354
 ## Model description
+More information needed
 ## Intended uses & limitations
 The following hyperparameters were used during training:
 - learning_rate: 2e-05
+- train_batch_size: 75
+- eval_batch_size: 75
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1     |
 |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|
+| 0.8649        | 1.0   | 214  | 0.3094          | 0.9055   | 0.9049 |
+| 0.2229        | 2.0   | 428  | 0.1845          | 0.9305   | 0.9311 |
+| 0.144         | 3.0   | 642  | 0.1556          | 0.941    | 0.9412 |
+| 0.1111        | 4.0   | 856  | 0.1394          | 0.941    | 0.9409 |
+| 0.095         | 5.0   | 1070 | 0.1382          | 0.9355   | 0.9354 |
 ### Framework versions
 - Transformers 4.34.1
 - Pytorch 2.0.1
+- Datasets 2.14.6
 - Tokenizers 0.14.1

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:712ca86c29df2d9c5c41e3f7c9c0242abf5c55d5f1a5fa3614aabfeaf7773222
 size 267867821

 version https://git-lfs.github.com/spec/v1
+oid sha256:47f248d6496e34f4874f7373d122dd49addc2752f3d0f27a79d5f195d85db1cc
 size 267867821

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5c71eea1153ffe119a35f2380beece1979faf3cb87452f06b164579c9f049a14
 size 4155

 version https://git-lfs.github.com/spec/v1
+oid sha256:fb0a95c0c93affa9ffb136bb0f4f1583d2b5d1b14a35ecd7db549d406bb0129a
 size 4155