deberta_finetune

This model is a fine-tuned version of microsoft/deberta-v3-base on an unknown dataset. It achieves the following results on the evaluation set:

  • eval_loss: 0.3943
  • eval_accuracy: 0.8673
  • eval_runtime: 164.2323
  • eval_samples_per_second: 29.178
  • eval_steps_per_second: 1.827
  • epoch: 2.0
  • step: 4164

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3

Framework versions

  • Transformers 4.25.1
  • Pytorch 1.13.0+cu116
  • Datasets 2.8.0
  • Tokenizers 0.13.2

Model Recycling

Evaluation on 36 datasets using nc33/deberta_finetune as a base model yields average score of 79.51 in comparison to 79.04 by microsoft/deberta-v3-base.

The model is ranked 3rd among all tested models for the microsoft/deberta-v3-base architecture as of 06/02/2023 Results:

20_newsgroup ag_news amazon_reviews_multi anli boolq cb cola copa dbpedia esnli financial_phrasebank imdb isear mnli mrpc multirc poem_sentiment qnli qqp rotten_tomatoes rte sst2 sst_5bins stsb trec_coarse trec_fine tweet_ev_emoji tweet_ev_emotion tweet_ev_hate tweet_ev_irony tweet_ev_offensive tweet_ev_sentiment wic wnli wsc yahoo_answers
86.1922 90.3667 67.48 58.5625 84.3425 73.2143 86.5772 68 79.6667 91.5717 88.6 94.472 72.2295 89.6359 90.1961 63.5314 87.5 93.5567 91.672 90.2439 83.0325 95.1835 58.371 90.4054 97.2 90.8 47.122 85.0809 59.3939 79.0816 83.7209 70.197 70.6897 67.6056 64.4231 72.3333

For more information, see: Model Recycling

Downloads last month
21
Safetensors
Model size
184M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.