|
--- |
|
language: en |
|
license: apache-2.0 |
|
library_name: transformers |
|
tags: |
|
- deberta-v3-large |
|
- text-classification |
|
- nli |
|
- natural-language-inference |
|
- multitask |
|
- multi-task |
|
- pipeline |
|
- extreme-multi-task |
|
- extreme-mtl |
|
- tasksource |
|
- zero-shot |
|
- rlhf |
|
datasets: |
|
- glue |
|
- super_glue |
|
- anli |
|
- metaeval/babi_nli |
|
- sick |
|
- snli |
|
- scitail |
|
- hans |
|
- alisawuffles/WANLI |
|
- metaeval/recast |
|
- sileod/probability_words_nli |
|
- joey234/nan-nli |
|
- pietrolesci/nli_fever |
|
- pietrolesci/breaking_nli |
|
- pietrolesci/conj_nli |
|
- pietrolesci/fracas |
|
- pietrolesci/dialogue_nli |
|
- pietrolesci/mpe |
|
- pietrolesci/dnc |
|
- pietrolesci/gpt3_nli |
|
- pietrolesci/recast_white |
|
- pietrolesci/joci |
|
- martn-nguyen/contrast_nli |
|
- pietrolesci/robust_nli |
|
- pietrolesci/robust_nli_is_sd |
|
- pietrolesci/robust_nli_li_ts |
|
- pietrolesci/gen_debiased_nli |
|
- pietrolesci/add_one_rte |
|
- metaeval/imppres |
|
- pietrolesci/glue_diagnostics |
|
- hlgd |
|
- paws |
|
- quora |
|
- medical_questions_pairs |
|
- conll2003 |
|
- Anthropic/hh-rlhf |
|
- Anthropic/model-written-evals |
|
- truthful_qa |
|
- nightingal3/fig-qa |
|
- tasksource/bigbench |
|
- bigbench |
|
- blimp |
|
- cos_e |
|
- cosmos_qa |
|
- dream |
|
- openbookqa |
|
- qasc |
|
- quartz |
|
- quail |
|
- head_qa |
|
- sciq |
|
- social_i_qa |
|
- wiki_hop |
|
- wiqa |
|
- piqa |
|
- hellaswag |
|
- pkavumba/balanced-copa |
|
- 12ml/e-CARE |
|
- art |
|
- tasksource/mmlu |
|
- winogrande |
|
- codah |
|
- allenai/ai2_arc |
|
- definite_pronoun_resolution |
|
- swag |
|
- math_qa |
|
- metaeval/utilitarianism |
|
- mteb/amazon_counterfactual |
|
- SetFit/insincere-questions |
|
- SetFit/toxic_conversations |
|
- turingbench/TuringBench |
|
- trec |
|
- tals/vitaminc |
|
- hope_edi |
|
- strombergnlp/rumoureval_2019 |
|
- ethos |
|
- tweet_eval |
|
- discovery |
|
- pragmeval |
|
- silicone |
|
- lex_glue |
|
- papluca/language-identification |
|
- imdb |
|
- rotten_tomatoes |
|
- ag_news |
|
- yelp_review_full |
|
- financial_phrasebank |
|
- poem_sentiment |
|
- dbpedia_14 |
|
- amazon_polarity |
|
- app_reviews |
|
- hate_speech18 |
|
- sms_spam |
|
- humicroedit |
|
- snips_built_in_intents |
|
- banking77 |
|
- hate_speech_offensive |
|
- yahoo_answers_topics |
|
- pacovaldez/stackoverflow-questions |
|
- zapsdcn/hyperpartisan_news |
|
- zapsdcn/sciie |
|
- zapsdcn/citation_intent |
|
- go_emotions |
|
- scicite |
|
- liar |
|
- relbert/lexical_relation_classification |
|
- metaeval/linguisticprobing |
|
- metaeval/crowdflower |
|
- metaeval/ethics |
|
- emo |
|
- google_wellformed_query |
|
- tweets_hate_speech_detection |
|
- has_part |
|
- wnut_17 |
|
- ncbi_disease |
|
- acronym_identification |
|
- jnlpba |
|
- species_800 |
|
- SpeedOfMagic/ontonotes_english |
|
- blog_authorship_corpus |
|
- launch/open_question_type |
|
- health_fact |
|
- commonsense_qa |
|
- mc_taco |
|
- ade_corpus_v2 |
|
- prajjwal1/discosense |
|
- circa |
|
- YaHi/EffectiveFeedbackStudentWriting |
|
- Ericwang/promptSentiment |
|
- Ericwang/promptNLI |
|
- Ericwang/promptSpoke |
|
- Ericwang/promptProficiency |
|
- Ericwang/promptGrammar |
|
- Ericwang/promptCoherence |
|
- PiC/phrase_similarity |
|
- copenlu/scientific-exaggeration-detection |
|
- quarel |
|
- mwong/fever-evidence-related |
|
- numer_sense |
|
- dynabench/dynasent |
|
- raquiba/Sarcasm_News_Headline |
|
- sem_eval_2010_task_8 |
|
- demo-org/auditor_review |
|
- medmcqa |
|
- aqua_rat |
|
- RuyuanWan/Dynasent_Disagreement |
|
- RuyuanWan/Politeness_Disagreement |
|
- RuyuanWan/SBIC_Disagreement |
|
- RuyuanWan/SChem_Disagreement |
|
- RuyuanWan/Dilemmas_Disagreement |
|
- lucasmccabe/logiqa |
|
- wiki_qa |
|
- metaeval/cycic_classification |
|
- metaeval/cycic_multiplechoice |
|
- metaeval/sts-companion |
|
- metaeval/commonsense_qa_2.0 |
|
- metaeval/lingnli |
|
- metaeval/monotonicity-entailment |
|
- metaeval/arct |
|
- metaeval/scinli |
|
- metaeval/naturallogic |
|
- onestop_qa |
|
- demelin/moral_stories |
|
- corypaik/prost |
|
- aps/dynahate |
|
- metaeval/syntactic-augmentation-nli |
|
- metaeval/autotnli |
|
- lasha-nlp/CONDAQA |
|
- openai/webgpt_comparisons |
|
- Dahoas/synthetic-instruct-gptj-pairwise |
|
- metaeval/scruples |
|
- metaeval/wouldyourather |
|
- sileod/attempto-nli |
|
- metaeval/defeasible-nli |
|
- metaeval/help-nli |
|
- metaeval/nli-veridicality-transitivity |
|
- metaeval/natural-language-satisfiability |
|
- metaeval/lonli |
|
- metaeval/dadc-limit-nli |
|
- ColumbiaNLP/FLUTE |
|
- metaeval/strategy-qa |
|
- openai/summarize_from_feedback |
|
- metaeval/folio |
|
- metaeval/tomi-nli |
|
- metaeval/avicenna |
|
- stanfordnlp/SHP |
|
- GBaker/MedQA-USMLE-4-options-hf |
|
- sileod/wikimedqa |
|
- declare-lab/cicero |
|
- amydeng2000/CREAK |
|
- metaeval/mutual |
|
- inverse-scaling/NeQA |
|
- inverse-scaling/quote-repetition |
|
- inverse-scaling/redefine-math |
|
- metaeval/puzzte |
|
- metaeval/implicatures |
|
- race |
|
- metaeval/spartqa-yn |
|
- metaeval/spartqa-mchoice |
|
- metaeval/temporal-nli |
|
metrics: |
|
- accuracy |
|
pipeline_tag: zero-shot-classification |
|
--- |
|
|
|
# Model Card for DeBERTa-v3-large-tasksource-nli |
|
|
|
DeBERTa-v3-large fine-tuned with multi-task learning on 600 tasks of the [tasksource collection](https://github.com/sileod/tasksource/) |
|
You can further fine-tune this model to use it for any classification or multiple-choice task. |
|
This checkpoint has strong zero-shot validation performance on many tasks (e.g. 77% on WNLI). |
|
The untuned model CLS embedding also has strong linear probing performance (90% on MNLI), due to the multitask training. |
|
|
|
This is the shared model with the MNLI classifier on top. Its encoder was trained on many datasets including bigbench, Anthropic rlhf, anli... alongside many NLI and classification tasks with a SequenceClassification heads while using only one shared encoder. |
|
Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched. |
|
The number of examples per task was capped to 64k. The model was trained for 80k steps with a batch size of 384, and a peak learning rate of 2e-5. |
|
|
|
|
|
tasksource training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing |
|
|
|
### Software |
|
https://github.com/sileod/tasksource/ \ |
|
https://github.com/sileod/tasknet/ \ |
|
Training took 6 days on Nvidia A100 40GB GPU. |
|
|
|
|
|
|
|
# Citation |
|
|
|
More details on this [article:](https://arxiv.org/abs/2301.05948) |
|
```bib |
|
@article{sileo2023tasksource, |
|
title={tasksource: Structured Dataset Preprocessing Annotations for Frictionless Extreme Multi-Task Learning and Evaluation}, |
|
author={Sileo, Damien}, |
|
url= {https://arxiv.org/abs/2301.05948}, |
|
journal={arXiv preprint arXiv:2301.05948}, |
|
year={2023} |
|
} |
|
``` |
|
|
|
# Loading a specific classifier |
|
Classifiers for all tasks available. See https://huggingface.co/sileod/deberta-v3-large-tasksource-adapters |
|
|
|
<img src="https://www.dropbox.com/s/eyfw8i1ekzxj3fa/task_embeddings.png?dl=1" width="1000" height=""> |
|
|
|
|
|
# Model Card Contact |
|
|
|
damien.sileo@inria.fr |
|
|
|
|
|
</details> |