Report for lxyuan/distilbert-base-multilingual-cased-sentiments-student
#3
by
inoki-giskard
- opened
Overconfidence issues (1)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation | Description |
---|---|---|---|---|---|---|
Overconfidence | medium | avg_digits(text) < 0.011 |
Overconfidence rate = 0.291 | — | +18.82% than global | For records in the dataset where avg_digits(text) < 0.011, we found a significantly higher number of overconfident wrong predictions (183 samples, corresponding to 29.093799682034977% of the wrong predictions in the data slice). |
Robustness issues (5)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation | Description |
---|---|---|---|---|---|---|
Robustness | major | — | Fail rate = 0.393 | Transform to uppercase | 393/1000 tested samples (39.3%) changed prediction after perturbation | When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 39.3% of the cases. We expected the predictions not to be affected by this transformation. |
Robustness | major | — | Fail rate = 0.307 | Transform to title case | 307/1000 tested samples (30.7%) changed prediction after perturbation | When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 30.7% of the cases. We expected the predictions not to be affected by this transformation. |
Robustness | major | — | Fail rate = 0.153 | Add typos | 153/1000 tested samples (15.3%) changed prediction after perturbation | When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 15.3% of the cases. We expected the predictions not to be affected by this transformation. |
Robustness | major | — | Fail rate = 0.144 | Transform to lowercase | 144/1000 tested samples (14.4%) changed prediction after perturbation | When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 14.4% of the cases. We expected the predictions not to be affected by this transformation. |
Robustness | medium | — | Fail rate = 0.092 | Punctuation Removal | 92/1000 tested samples (9.2%) changed prediction after perturbation | When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 9.2% of the cases. We expected the predictions not to be affected by this transformation. |
Performance issues (1)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation | Description |
---|---|---|---|---|---|---|
Performance | medium | text contains "friday" |
Precision = 0.432 | — | -7.05% than global | For records in the dataset where text contains "friday", the Precision is 7.05% lower than the global Precision. |
inoki-giskard
changed discussion status to
closed