Evaluation results for ibm/ColD-Fusion model as a base model for other tasks
#1
by
eladven
- opened
README.md
CHANGED
@@ -51,6 +51,20 @@ output = model(encoded_input)
|
|
51 |
```
|
52 |
|
53 |
## Evaluation results
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
54 |
See full evaluation results of this model and many more [here](https://ibm.github.io/model-recycling/roberta-base_table.html)
|
55 |
When fine-tuned on downstream tasks, this model achieves the following results:
|
56 |
|
|
|
51 |
```
|
52 |
|
53 |
## Evaluation results
|
54 |
+
|
55 |
+
## Model Recycling
|
56 |
+
|
57 |
+
[Evaluation on 36 datasets](https://ibm.github.io/model-recycling/model_gain_chart?avg=2.25&mnli_lp=nan&20_newsgroup=0.54&ag_news=0.03&amazon_reviews_multi=-0.32&anli=1.59&boolq=2.68&cb=19.73&cola=-0.22&copa=23.30&dbpedia=1.34&esnli=0.15&financial_phrasebank=2.99&imdb=-0.04&isear=1.06&mnli=0.31&mrpc=-0.86&multirc=2.50&poem_sentiment=1.63&qnli=-0.00&qqp=0.40&rotten_tomatoes=3.41&rte=12.80&sst2=1.30&sst_5bins=-0.30&stsb=1.38&trec_coarse=-0.11&trec_fine=2.64&tweet_ev_emoji=0.00&tweet_ev_emotion=1.22&tweet_ev_hate=1.55&tweet_ev_irony=6.37&tweet_ev_offensive=1.38&tweet_ev_sentiment=-0.60&wic=3.17&wnli=-6.90&wsc=-2.69&yahoo_answers=-0.53&model_name=ibm%2FColD-Fusion&base_name=roberta-base) using ibm/ColD-Fusion as a base model yields average score of 78.47 in comparison to 76.22 by roberta-base.
|
58 |
+
|
59 |
+
The model is ranked 1st among all tested models for the roberta-base architecture as of 21/12/2022
|
60 |
+
Results:
|
61 |
+
|
62 |
+
| 20_newsgroup | ag_news | amazon_reviews_multi | anli | boolq | cb | cola | copa | dbpedia | esnli | financial_phrasebank | imdb | isear | mnli | mrpc | multirc | poem_sentiment | qnli | qqp | rotten_tomatoes | rte | sst2 | sst_5bins | stsb | trec_coarse | trec_fine | tweet_ev_emoji | tweet_ev_emotion | tweet_ev_hate | tweet_ev_irony | tweet_ev_offensive | tweet_ev_sentiment | wic | wnli | wsc | yahoo_answers |
|
63 |
+
|---------------:|----------:|-----------------------:|--------:|--------:|-----:|--------:|-------:|----------:|--------:|-----------------------:|-------:|--------:|--------:|--------:|----------:|-----------------:|--------:|-------:|------------------:|--------:|--------:|------------:|--------:|--------------:|------------:|-----------------:|-------------------:|----------------:|-----------------:|---------------------:|---------------------:|-------:|--------:|--------:|----------------:|
|
64 |
+
| 85.8205 | 89.8 | 66.26 | 51.9375 | 81.3761 | 87.5 | 83.3174 | 72 | 78.6333 | 91.1441 | 88.1 | 93.864 | 73.5332 | 87.2966 | 87.0098 | 63.717 | 85.5769 | 92.4034 | 91.113 | 91.8386 | 85.1986 | 95.4128 | 56.3801 | 91.2964 | 97 | 90.4 | 46.306 | 83.0401 | 54.4444 | 77.9337 | 85.9302 | 70.4331 | 68.652 | 47.8873 | 60.5769 | 71.8667 |
|
65 |
+
|
66 |
+
|
67 |
+
For more information, see: [Model Recycling](https://ibm.github.io/model-recycling/)
|
68 |
See full evaluation results of this model and many more [here](https://ibm.github.io/model-recycling/roberta-base_table.html)
|
69 |
When fine-tuned on downstream tasks, this model achieves the following results:
|
70 |
|