MoritzLaurer
/

xtremedistil-l6-h256-zeroshot-v1.1-all-33

@@ -1,70 +1,38 @@
 ---
-license: mit
 base_model: microsoft/xtremedistil-l6-h256-uncased
 tags:
-- generated_from_trainer
-metrics:
-- accuracy
-model-index:
-- name: xtremedistil-l6-h256-uncased-zeroshot-v1.1-none
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# xtremedistil-l6-h256-uncased-zeroshot-v1.1-none
-This model is a fine-tuned version of [microsoft/xtremedistil-l6-h256-uncased](https://huggingface.co/microsoft/xtremedistil-l6-h256-uncased) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.1992
-- F1 Macro: 0.5455
-- F1 Micro: 0.6194
-- Accuracy Balanced: 0.5960
-- Accuracy: 0.6194
-- Precision Macro: 0.5566
-- Recall Macro: 0.5960
-- Precision Micro: 0.6194
-- Recall Micro: 0.6194
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 2e-05
-- train_batch_size: 32
-- eval_batch_size: 128
-- seed: 42
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_ratio: 0.06
-- num_epochs: 3
-### Training results
-| Training Loss | Epoch | Step  | Validation Loss | F1 Macro | F1 Micro | Accuracy Balanced | Accuracy | Precision Macro | Recall Macro | Precision Micro | Recall Micro |
-|:-------------:|:-----:|:-----:|:---------------:|:--------:|:--------:|:-----------------:|:--------:|:---------------:|:------------:|:---------------:|:------------:|
-| 0.3056        | 1.0   | 30790 | 0.4634          | 0.7791   | 0.8013   | 0.7757            | 0.8013   | 0.7832          | 0.7757       | 0.8013          | 0.8013       |
-| 0.2847        | 2.0   | 61580 | 0.4656          | 0.7826   | 0.8040   | 0.7797            | 0.8040   | 0.7859          | 0.7797       | 0.8040          | 0.8040       |
-| 0.2618        | 3.0   | 92370 | 0.4774          | 0.7848   | 0.8045   | 0.7841            | 0.8045   | 0.7856          | 0.7841       | 0.8045          | 0.8045       |
-### Framework versions
-- Transformers 4.33.3
-- Pytorch 2.1.2+cu121
-- Datasets 2.14.7
-- Tokenizers 0.13.3

 ---
 base_model: microsoft/xtremedistil-l6-h256-uncased
+language:
+- en
 tags:
+- text-classification
+- zero-shot-classification
+pipeline_tag: zero-shot-classification
+library_name: transformers
+license: mit
 ---
+# xtremedistil-l6-h256-zeroshot-v1.1-all-33
+This model was fine-tuned using the same pipeline as described in
+the model card for [MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33](https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33)
+and in this [paper](https://arxiv.org/pdf/2312.17543.pdf).
+The foundation model is [microsoft/xtremedistil-l6-h256-uncased](https://huggingface.co/microsoft/xtremedistil-l6-h256-uncased).
+The model only has 22 million parameters and is 51 MB small, providing a significant speedup over larger models.
+This model was trained to provide a very small and highly efficient zeroshot option,
+especially for edge devices or in-browser use-cases with transformers.js.
+## Metrics:
+I didn't not do zeroshot evaluation for this model to save time and compute.
+The table below shows standard accuracy for all datasets the model was trained on.
+|Datasets|mnli_m|mnli_mm|fevernli|anli_r1|anli_r2|anli_r3|wanli|lingnli|wellformedquery|rottentomatoes|amazonpolarity|imdb|yelpreviews|hatexplain|massive|banking77|emotiondair|emocontext|empathetic|agnews|yahootopics|biasframes_sex|biasframes_offensive|biasframes_intent|financialphrasebank|appreviews|hateoffensive|trueteacher|spam|wikitoxic_toxicaggregated|wikitoxic_obscene|wikitoxic_identityhate|wikitoxic_threat|wikitoxic_insult|manifesto|capsotu|
+| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+|Accuracy|0.894|0.895|0.854|0.629|0.582|0.618|0.772|0.826|0.684|0.794|0.91|0.879|0.935|0.676|0.651|0.521|0.654|0.707|0.369|0.858|0.649|0.876|0.836|0.839|0.849|0.892|0.894|0.525|0.976|0.88|0.901|0.874|0.903|0.886|0.433|0.619|
+|Inference text/sec (A10G GPU, batch=128)|4117.0|4093.0|1935.0|2984.0|3094.0|2683.0|5788.0|4926.0|9701.0|6359.0|1843.0|692.0|756.0|5561.0|10172.0|9070.0|7511.0|7480.0|2256.0|3942.0|1020.0|4362.0|4034.0|4185.0|5449.0|2606.0|6343.0|931.0|5550.0|864.0|839.0|837.0|832.0|857.0|4418.0|4845.0|