metadata

license: mit
tags:
  - generated_from_trainer
model-index:
  - name: deberta-v3-base_MNLI_10_19_v0
    results: []

deberta-v3-base_MNLI_10_19_v0

This model is a fine-tuned version of microsoft/deberta-v3-base on the None dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 2

Training results

Framework versions

Transformers 4.23.1
Pytorch 1.12.1+cu113
Datasets 2.6.1
Tokenizers 0.13.1

Model Recycling

Evaluation on 36 datasets using mariolinml/deberta-v3-base_MNLI_10_19_v0 as a base model yields average score of 79.75 in comparison to 79.04 by microsoft/deberta-v3-base.

The model is ranked 3rd among all tested models for the microsoft/deberta-v3-base architecture as of 22/01/2023 Results:

20_newsgroup	ag_news	amazon_reviews_multi	anli	boolq	cb	cola	copa	dbpedia	esnli	financial_phrasebank	imdb	isear	mnli	mrpc	multirc	poem_sentiment	qnli	qqp	rotten_tomatoes	rte	sst2	sst_5bins	stsb	trec_coarse	trec_fine	tweet_ev_emoji	tweet_ev_emotion	tweet_ev_hate	tweet_ev_irony	tweet_ev_offensive	tweet_ev_sentiment	wic	wnli	wsc	yahoo_answers
85.8471	90.2333	66.74	60.0625	81.8349	82.1429	84.8514	69	79.4333	91.1136	86.9	94.372	71.382	89.7172	88.2353	64.3771	88.4615	93.758	91.8699	89.7749	85.5596	95.1835	57.4661	91.7396	97.6	91.8	45.526	84.2365	55.9933	79.8469	84.3023	71.2634	70.0627	74.6479	63.4615	72.1333

For more information, see: Model Recycling