MoritzLaurer HF staff commited on
Commit
8754f63
1 Parent(s): 454637f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -55
README.md CHANGED
@@ -1,70 +1,38 @@
1
  ---
2
- license: mit
3
  base_model: microsoft/xtremedistil-l6-h256-uncased
 
 
4
  tags:
5
- - generated_from_trainer
6
- metrics:
7
- - accuracy
8
- model-index:
9
- - name: xtremedistil-l6-h256-uncased-zeroshot-v1.1-none
10
- results: []
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
-
16
- # xtremedistil-l6-h256-uncased-zeroshot-v1.1-none
17
-
18
- This model is a fine-tuned version of [microsoft/xtremedistil-l6-h256-uncased](https://huggingface.co/microsoft/xtremedistil-l6-h256-uncased) on an unknown dataset.
19
- It achieves the following results on the evaluation set:
20
- - Loss: 0.1992
21
- - F1 Macro: 0.5455
22
- - F1 Micro: 0.6194
23
- - Accuracy Balanced: 0.5960
24
- - Accuracy: 0.6194
25
- - Precision Macro: 0.5566
26
- - Recall Macro: 0.5960
27
- - Precision Micro: 0.6194
28
- - Recall Micro: 0.6194
29
-
30
- ## Model description
31
-
32
- More information needed
33
-
34
- ## Intended uses & limitations
35
-
36
- More information needed
37
 
38
- ## Training and evaluation data
39
 
40
- More information needed
41
 
42
- ## Training procedure
 
 
 
 
 
43
 
44
- ### Training hyperparameters
 
45
 
46
- The following hyperparameters were used during training:
47
- - learning_rate: 2e-05
48
- - train_batch_size: 32
49
- - eval_batch_size: 128
50
- - seed: 42
51
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
52
- - lr_scheduler_type: linear
53
- - lr_scheduler_warmup_ratio: 0.06
54
- - num_epochs: 3
55
 
56
- ### Training results
57
 
58
- | Training Loss | Epoch | Step | Validation Loss | F1 Macro | F1 Micro | Accuracy Balanced | Accuracy | Precision Macro | Recall Macro | Precision Micro | Recall Micro |
59
- |:-------------:|:-----:|:-----:|:---------------:|:--------:|:--------:|:-----------------:|:--------:|:---------------:|:------------:|:---------------:|:------------:|
60
- | 0.3056 | 1.0 | 30790 | 0.4634 | 0.7791 | 0.8013 | 0.7757 | 0.8013 | 0.7832 | 0.7757 | 0.8013 | 0.8013 |
61
- | 0.2847 | 2.0 | 61580 | 0.4656 | 0.7826 | 0.8040 | 0.7797 | 0.8040 | 0.7859 | 0.7797 | 0.8040 | 0.8040 |
62
- | 0.2618 | 3.0 | 92370 | 0.4774 | 0.7848 | 0.8045 | 0.7841 | 0.8045 | 0.7856 | 0.7841 | 0.8045 | 0.8045 |
63
 
 
 
 
 
64
 
65
- ### Framework versions
66
 
67
- - Transformers 4.33.3
68
- - Pytorch 2.1.2+cu121
69
- - Datasets 2.14.7
70
- - Tokenizers 0.13.3
 
1
  ---
 
2
  base_model: microsoft/xtremedistil-l6-h256-uncased
3
+ language:
4
+ - en
5
  tags:
6
+ - text-classification
7
+ - zero-shot-classification
8
+ pipeline_tag: zero-shot-classification
9
+ library_name: transformers
10
+ license: mit
 
11
  ---
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
 
14
 
15
+ # xtremedistil-l6-h256-zeroshot-v1.1-all-33
16
 
17
+ This model was fine-tuned using the same pipeline as described in
18
+ the model card for [MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33](https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33)
19
+ and in this [paper](https://arxiv.org/pdf/2312.17543.pdf).
20
+
21
+ The foundation model is [microsoft/xtremedistil-l6-h256-uncased](https://huggingface.co/microsoft/xtremedistil-l6-h256-uncased).
22
+ The model only has 22 million parameters and is 51 MB small, providing a significant speedup over larger models.
23
 
24
+ This model was trained to provide a very small and highly efficient zeroshot option,
25
+ especially for edge devices or in-browser use-cases with transformers.js.
26
 
 
 
 
 
 
 
 
 
 
27
 
28
+ ## Metrics:
29
 
30
+ I didn't not do zeroshot evaluation for this model to save time and compute.
31
+ The table below shows standard accuracy for all datasets the model was trained on.
 
 
 
32
 
33
+ |Datasets|mnli_m|mnli_mm|fevernli|anli_r1|anli_r2|anli_r3|wanli|lingnli|wellformedquery|rottentomatoes|amazonpolarity|imdb|yelpreviews|hatexplain|massive|banking77|emotiondair|emocontext|empathetic|agnews|yahootopics|biasframes_sex|biasframes_offensive|biasframes_intent|financialphrasebank|appreviews|hateoffensive|trueteacher|spam|wikitoxic_toxicaggregated|wikitoxic_obscene|wikitoxic_identityhate|wikitoxic_threat|wikitoxic_insult|manifesto|capsotu|
34
+ | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
35
+ |Accuracy|0.894|0.895|0.854|0.629|0.582|0.618|0.772|0.826|0.684|0.794|0.91|0.879|0.935|0.676|0.651|0.521|0.654|0.707|0.369|0.858|0.649|0.876|0.836|0.839|0.849|0.892|0.894|0.525|0.976|0.88|0.901|0.874|0.903|0.886|0.433|0.619|
36
+ |Inference text/sec (A10G GPU, batch=128)|4117.0|4093.0|1935.0|2984.0|3094.0|2683.0|5788.0|4926.0|9701.0|6359.0|1843.0|692.0|756.0|5561.0|10172.0|9070.0|7511.0|7480.0|2256.0|3942.0|1020.0|4362.0|4034.0|4185.0|5449.0|2606.0|6343.0|931.0|5550.0|864.0|839.0|837.0|832.0|857.0|4418.0|4845.0|
37
 
 
38