Commit
•
8754f63
1
Parent(s):
454637f
Update README.md
Browse files
README.md
CHANGED
@@ -1,70 +1,38 @@
|
|
1 |
---
|
2 |
-
license: mit
|
3 |
base_model: microsoft/xtremedistil-l6-h256-uncased
|
|
|
|
|
4 |
tags:
|
5 |
-
-
|
6 |
-
|
7 |
-
-
|
8 |
-
|
9 |
-
|
10 |
-
results: []
|
11 |
---
|
12 |
|
13 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
14 |
-
should probably proofread and complete it, then remove this comment. -->
|
15 |
-
|
16 |
-
# xtremedistil-l6-h256-uncased-zeroshot-v1.1-none
|
17 |
-
|
18 |
-
This model is a fine-tuned version of [microsoft/xtremedistil-l6-h256-uncased](https://huggingface.co/microsoft/xtremedistil-l6-h256-uncased) on an unknown dataset.
|
19 |
-
It achieves the following results on the evaluation set:
|
20 |
-
- Loss: 0.1992
|
21 |
-
- F1 Macro: 0.5455
|
22 |
-
- F1 Micro: 0.6194
|
23 |
-
- Accuracy Balanced: 0.5960
|
24 |
-
- Accuracy: 0.6194
|
25 |
-
- Precision Macro: 0.5566
|
26 |
-
- Recall Macro: 0.5960
|
27 |
-
- Precision Micro: 0.6194
|
28 |
-
- Recall Micro: 0.6194
|
29 |
-
|
30 |
-
## Model description
|
31 |
-
|
32 |
-
More information needed
|
33 |
-
|
34 |
-
## Intended uses & limitations
|
35 |
-
|
36 |
-
More information needed
|
37 |
|
38 |
-
## Training and evaluation data
|
39 |
|
40 |
-
|
41 |
|
42 |
-
|
|
|
|
|
|
|
|
|
|
|
43 |
|
44 |
-
|
|
|
45 |
|
46 |
-
The following hyperparameters were used during training:
|
47 |
-
- learning_rate: 2e-05
|
48 |
-
- train_batch_size: 32
|
49 |
-
- eval_batch_size: 128
|
50 |
-
- seed: 42
|
51 |
-
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
52 |
-
- lr_scheduler_type: linear
|
53 |
-
- lr_scheduler_warmup_ratio: 0.06
|
54 |
-
- num_epochs: 3
|
55 |
|
56 |
-
|
57 |
|
58 |
-
|
59 |
-
|
60 |
-
| 0.3056 | 1.0 | 30790 | 0.4634 | 0.7791 | 0.8013 | 0.7757 | 0.8013 | 0.7832 | 0.7757 | 0.8013 | 0.8013 |
|
61 |
-
| 0.2847 | 2.0 | 61580 | 0.4656 | 0.7826 | 0.8040 | 0.7797 | 0.8040 | 0.7859 | 0.7797 | 0.8040 | 0.8040 |
|
62 |
-
| 0.2618 | 3.0 | 92370 | 0.4774 | 0.7848 | 0.8045 | 0.7841 | 0.8045 | 0.7856 | 0.7841 | 0.8045 | 0.8045 |
|
63 |
|
|
|
|
|
|
|
|
|
64 |
|
65 |
-
### Framework versions
|
66 |
|
67 |
-
- Transformers 4.33.3
|
68 |
-
- Pytorch 2.1.2+cu121
|
69 |
-
- Datasets 2.14.7
|
70 |
-
- Tokenizers 0.13.3
|
|
|
1 |
---
|
|
|
2 |
base_model: microsoft/xtremedistil-l6-h256-uncased
|
3 |
+
language:
|
4 |
+
- en
|
5 |
tags:
|
6 |
+
- text-classification
|
7 |
+
- zero-shot-classification
|
8 |
+
pipeline_tag: zero-shot-classification
|
9 |
+
library_name: transformers
|
10 |
+
license: mit
|
|
|
11 |
---
|
12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
|
|
|
14 |
|
15 |
+
# xtremedistil-l6-h256-zeroshot-v1.1-all-33
|
16 |
|
17 |
+
This model was fine-tuned using the same pipeline as described in
|
18 |
+
the model card for [MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33](https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33)
|
19 |
+
and in this [paper](https://arxiv.org/pdf/2312.17543.pdf).
|
20 |
+
|
21 |
+
The foundation model is [microsoft/xtremedistil-l6-h256-uncased](https://huggingface.co/microsoft/xtremedistil-l6-h256-uncased).
|
22 |
+
The model only has 22 million parameters and is 51 MB small, providing a significant speedup over larger models.
|
23 |
|
24 |
+
This model was trained to provide a very small and highly efficient zeroshot option,
|
25 |
+
especially for edge devices or in-browser use-cases with transformers.js.
|
26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
|
28 |
+
## Metrics:
|
29 |
|
30 |
+
I didn't not do zeroshot evaluation for this model to save time and compute.
|
31 |
+
The table below shows standard accuracy for all datasets the model was trained on.
|
|
|
|
|
|
|
32 |
|
33 |
+
|Datasets|mnli_m|mnli_mm|fevernli|anli_r1|anli_r2|anli_r3|wanli|lingnli|wellformedquery|rottentomatoes|amazonpolarity|imdb|yelpreviews|hatexplain|massive|banking77|emotiondair|emocontext|empathetic|agnews|yahootopics|biasframes_sex|biasframes_offensive|biasframes_intent|financialphrasebank|appreviews|hateoffensive|trueteacher|spam|wikitoxic_toxicaggregated|wikitoxic_obscene|wikitoxic_identityhate|wikitoxic_threat|wikitoxic_insult|manifesto|capsotu|
|
34 |
+
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
35 |
+
|Accuracy|0.894|0.895|0.854|0.629|0.582|0.618|0.772|0.826|0.684|0.794|0.91|0.879|0.935|0.676|0.651|0.521|0.654|0.707|0.369|0.858|0.649|0.876|0.836|0.839|0.849|0.892|0.894|0.525|0.976|0.88|0.901|0.874|0.903|0.886|0.433|0.619|
|
36 |
+
|Inference text/sec (A10G GPU, batch=128)|4117.0|4093.0|1935.0|2984.0|3094.0|2683.0|5788.0|4926.0|9701.0|6359.0|1843.0|692.0|756.0|5561.0|10172.0|9070.0|7511.0|7480.0|2256.0|3942.0|1020.0|4362.0|4034.0|4185.0|5449.0|2606.0|6343.0|931.0|5550.0|864.0|839.0|837.0|832.0|857.0|4418.0|4845.0|
|
37 |
|
|
|
38 |
|
|
|
|
|
|
|
|