File size: 7,772 Bytes
51cfcd7 b92bac0 1ba8f2e ce9a8c0 7052acd 93dcffc ce9a8c0 eaece4c c28a694 eaece4c c28a694 93dcffc 7052acd 93dcffc c28a694 6c83863 93dcffc c28a694 6c83863 03341de 51cfcd7 1ba8f2e 360ec4b 64dc369 0a45a86 64dc369 e7c8e32 360ec4b dcc302f 1ba8f2e dcc302f 1ba8f2e dcc302f 08aa6a0 9f6bb5d 08aa6a0 9f6bb5d dcc302f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
---
datasets:
- multi_nli
- snli
- scitail
metrics:
- accuracy
- f1
pipeline_tag: zero-shot-classification
language:
- en
model-index:
- name: AntoineBlanot/flan-t5-xxl-classif-3way
results:
- task:
type: nli # Required. Example: automatic-speech-recognition
name: Natural Language Inference # Optional. Example: Speech Recognition
dataset:
type: multi_nli # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
name: MultiNLI # Required. A pretty name for the dataset. Example: Common Voice (French)
split: validation_matched # Optional. Example: test
metrics:
- type: accuracy # Required. Example: wer. Use metric id from https://hf.co/metrics
value: 0.9230769230769231 # Required. Example: 20.90
name: Validation (matched) accuracy # Optional. Example: Test WER
- type: f1 # Required. Example: wer. Use metric id from https://hf.co/metrics
value: 0.9225172687920663 # Required. Example: 20.90
name: Validation (matched) f1 # Optional. Example: Test WER
- task:
type: nli # Required. Example: automatic-speech-recognition
name: Natural Language Inference # Optional. Example: Speech Recognition
dataset:
type: multi_nli # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
name: MultiNLI # Required. A pretty name for the dataset. Example: Common Voice (French)
split: validation_mismatched # Optional. Example: test
metrics:
- type: accuracy # Required. Example: wer. Use metric id from https://hf.co/metrics
value: 0.9222945484133441 # Required. Example: 20.90
name: Validation (mismatched) accuracy # Optional. Example: Test WER
- type: f1 # Required. Example: wer. Use metric id from https://hf.co/metrics
value: 0.9216699467726924 # Required. Example: 20.90
name: Validation (mismatched) f1 # Optional. Example: Test WER
- task:
type: nli # Required. Example: automatic-speech-recognition
name: Natural Language Inference # Optional. Example: Speech Recognition
dataset:
type: snli # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
name: SNLI # Required. A pretty name for the dataset. Example: Common Voice (French)
split: validation # Optional. Example: test
metrics:
- type: accuracy # Required. Example: wer. Use metric id from https://hf.co/metrics
value: 0.9418817313554155 # Required. Example: 20.90
name: Validation accuracy # Optional. Example: Test WER
- type: f1 # Required. Example: wer. Use metric id from https://hf.co/metrics
value: 0.9416213776111287 # Required. Example: 20.90
name: Validation f1 # Optional. Example: Test WER
- task:
type: nli # Required. Example: automatic-speech-recognition
name: Natural Language Inference # Optional. Example: Speech Recognition
dataset:
type: scitail # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
name: SciTail # Required. A pretty name for the dataset. Example: Common Voice (French)
split: validation # Optional. Example: test
metrics:
- type: accuracy # Required. Example: wer. Use metric id from https://hf.co/metrics
value: 0.9662576687116564 # Required. Example: 20.90
name: Validation accuracy # Optional. Example: Test WER
- type: f1 # Required. Example: wer. Use metric id from https://hf.co/metrics
value: 0.6471347983817357 # Required. Example: 20.90
name: Validation f1 # Optional. Example: Test WER
---
# T5ForSequenceClassification
**T5ForSequenceClassification** adapts the original [T5](https://github.com/google-research/text-to-text-transfer-transformer) architecture for sequence classification tasks.
T5 was originally built for text-to-text tasks and excels in it.
It can handle any NLP task if it has been converted to a text-to-text format, including sequence classification task!
You can find [here](https://huggingface.co/google/flan-t5-base?text=Premise%3A++At+my+age+you+will+probably+have+learnt+one+lesson.+Hypothesis%3A++It%27s+not+certain+how+many+lessons+you%27ll+learn+by+your+thirties.+Does+the+premise+entail+the+hypothesis%3F) how the original T5 is used for sequence classification task.
Our motivations for building **T5ForSequenceClassification** is that the full original T5 architecture is not needed for most NLU tasks. Indeed, NLU tasks generally do not require to generate text and thus a large decoder is unnecessary.
By removing the decoder we can *half the original number of parameters* (thus half the computation cost) and *efficiently optimize* the network for the given task.
## Table of Contents
0. [Usage](#usage)
1. [Why use T5ForSequenceClassification?](#why-use-t5forsequenceclassification)
2. [T5ForClassification vs T5](#t5forclassification-vs-t5)
3. [Results](#results)
## Usage
**T5ForSequenceClassification** supports the task of zero-shot classification.
It can direclty be used for:
- topic classification
- intent recognition
- boolean question answering
- sentiment analysis
- and any other task which goal is to clasify a text...
Since the *T5ForClassification* class is currently not supported by the transformers library, you cannot direclty use this model on the Hub.
To use **T5ForSequenceClassification**, you will have to install additional packages and model weights.
You can find instructions [here](https://github.com/AntoineBlanot/zero-nlp).
## Why use T5ForSequenceClassification?
Models based on the [BERT](https://huggingface.co/bert-large-uncased) architecture like [RoBERTa](https://huggingface.co/roberta-large) and [DeBERTa](https://huggingface.co/microsoft/deberta-v2-xxlarge) have shown very strong performance on sequence classification task and are still widely used today.
However, those models only scale up to ~1.5B parameters (DeBERTa xxlarge) resulting in a limited knowledge compare to bigger models.
On the other hand, models based on the T5 architecture scale up to ~11B parameters (t5-xxl) and innovations with this architecture are very recent and keeps improving ([mT5](https://huggingface.co/google/mt5-xxl), [Flan-T5](https://huggingface.co/google/flan-t5-xxl), [UL2](https://huggingface.co/google/ul2), [Flan-UL2](https://huggingface.co/google/flan-ul2), and probably more...)
## T5ForClassification vs T5
**T5ForClassification** Architecture:
- Encoder: same as original T5
- Decoder: only the first layer (for pooling purpose)
- Classification head: simple Linear layer on top of the decoder
Benefits and Drawbacks:
- (**+**) Keeps T5 encoding strength
- (**+**) Parameters size is half
- (**+**) Interpretable outputs (class logits)
- (**+**) No generation mistakes and faster prediction (no generation latency)
- (**-**) Looses text-to-text ability
## Results
Results on the validation data of **training tasks**:
| Dataset | Accuracy | F1 |
|:-------:|:--------:|:--:|
| MNLI (m)| 0.923 | 0.923 |
| MNLI (mm) | 0.922 | 0.922 |
| SNLI | 0.942 | 0.942 |
| SciTail | 0.966 | 0.647 |
Results on validation data of **unseen tasks** (zero-shot):
| Dataset | Accuracy | F1 |
|:-------:|:--------:|:--:|
| ?| ? | ? |
Special thanks to [philschmid](https://huggingface.co/philschmid) for making a Flan-T5-xxl [checkpoint](https://huggingface.co/philschmid/flan-t5-xxl-sharded-fp16) in fp16.
|