eladven commited on
Commit
fb2046f
1 Parent(s): a00dda7

Evaluation results for ibm/ColD-Fusion-bert-base-uncased-itr23-seed0 model as a base model for other tasks

Browse files

As part of a research effort to identify high quality models in Huggingface that can serve as base models for further finetuning, we evaluated this by finetuning on 36 datasets. The model ranks 1st among all tested models for the bert-base-uncased architecture as of 09/01/2023.


To share this information with others in your model card, please add the following evaluation results to your README.md page.

For more information please see https://ibm.github.io/model-recycling/ or contact me.

Best regards,
Elad Venezian
eladv@il.ibm.com
IBM Research AI

Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -51,6 +51,20 @@ output = model(encoded_input)
51
  ```
52
 
53
  ## Evaluation results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  See full evaluation results of this model and many more [here](https://ibm.github.io/model-recycling/roberta-base_table.html)
55
  When fine-tuned on downstream tasks, this model achieves the following results:
56
 
 
51
  ```
52
 
53
  ## Evaluation results
54
+
55
+ ## Model Recycling
56
+
57
+ [Evaluation on 36 datasets](https://ibm.github.io/model-recycling/model_gain_chart?avg=3.44&mnli_lp=nan&20_newsgroup=2.07&ag_news=-0.46&amazon_reviews_multi=0.34&anli=2.14&boolq=5.42&cb=12.41&cola=0.15&copa=8.55&dbpedia=0.04&esnli=1.02&financial_phrasebank=15.57&imdb=0.52&isear=0.22&mnli=0.65&mrpc=5.02&multirc=-0.61&poem_sentiment=18.89&qnli=-0.60&qqp=0.29&rotten_tomatoes=4.55&rte=18.00&sst2=2.18&sst_5bins=2.72&stsb=2.71&trec_coarse=1.14&trec_fine=12.67&tweet_ev_emoji=0.28&tweet_ev_emotion=1.16&tweet_ev_hate=2.20&tweet_ev_irony=0.61&tweet_ev_offensive=-0.37&tweet_ev_sentiment=0.82&wic=2.58&wnli=1.55&wsc=0.38&yahoo_answers=-1.02&model_name=ibm%2FColD-Fusion-bert-base-uncased-itr23-seed0&base_name=bert-base-uncased) using ibm/ColD-Fusion-bert-base-uncased-itr23-seed0 as a base model yields average score of 75.64 in comparison to 72.20 by bert-base-uncased.
58
+
59
+ The model is ranked 1st among all tested models for the bert-base-uncased architecture as of 09/01/2023
60
+ Results:
61
+
62
+ | 20_newsgroup | ag_news | amazon_reviews_multi | anli | boolq | cb | cola | copa | dbpedia | esnli | financial_phrasebank | imdb | isear | mnli | mrpc | multirc | poem_sentiment | qnli | qqp | rotten_tomatoes | rte | sst2 | sst_5bins | stsb | trec_coarse | trec_fine | tweet_ev_emoji | tweet_ev_emotion | tweet_ev_hate | tweet_ev_irony | tweet_ev_offensive | tweet_ev_sentiment | wic | wnli | wsc | yahoo_answers |
63
+ |---------------:|----------:|-----------------------:|--------:|--------:|--------:|--------:|-------:|----------:|--------:|-----------------------:|-------:|--------:|--------:|--------:|----------:|-----------------:|--------:|--------:|------------------:|--------:|--------:|------------:|--------:|--------------:|------------:|-----------------:|-------------------:|----------------:|-----------------:|---------------------:|---------------------:|--------:|--------:|------:|----------------:|
64
+ | 85.1168 | 89.1333 | 66.26 | 49.0938 | 74.3731 | 76.7857 | 81.9751 | 58 | 78.2 | 90.7268 | 84.1 | 92.096 | 69.296 | 84.3775 | 87.0098 | 59.3647 | 85.5769 | 89.2733 | 90.5639 | 89.3996 | 77.9783 | 94.1514 | 55.5204 | 88.5727 | 97.2 | 81 | 36.282 | 81.0697 | 55.0505 | 68.3673 | 85 | 70.3028 | 65.8307 | 52.1127 | 62.5 | 71.3 |
65
+
66
+
67
+ For more information, see: [Model Recycling](https://ibm.github.io/model-recycling/)
68
  See full evaluation results of this model and many more [here](https://ibm.github.io/model-recycling/roberta-base_table.html)
69
  When fine-tuned on downstream tasks, this model achieves the following results:
70