Update README.md
Browse filesFocus evaluation data on positive class only.
README.md
CHANGED
@@ -28,17 +28,11 @@ events partially overlap, this is counted as a true positive.
|
|
28 |
|
29 |
## Evaluation on ROG corpus
|
30 |
|
31 |
-
|
32 |
-
no true negatives are present in the data and the behaviour of the negative class (i.e. no filled pause detected) is unpredictable.
|
33 |
```
|
34 |
precision recall f1-score support
|
35 |
|
36 |
-
0 0.531 0.123 0.200 211
|
37 |
1 0.907 0.987 0.946 1834
|
38 |
-
|
39 |
-
accuracy 0.898 2045
|
40 |
-
macro avg 0.719 0.555 0.573 2045
|
41 |
-
weighted avg 0.868 0.898 0.869 2045
|
42 |
```
|
43 |
|
44 |
## Evaluation on ParlaSpeech [HR](https://huggingface.co/datasets/classla/ParlaSpeech-HR) and [RS](https://huggingface.co/datasets/classla/ParlaSpeech-RS) corpora
|
@@ -50,23 +44,12 @@ Performance on RS:
|
|
50 |
Classification report for human vs model on event level:
|
51 |
precision recall f1-score support
|
52 |
|
53 |
-
0 0.97 0.87 0.92 234
|
54 |
1 0.95 0.99 0.97 542
|
55 |
-
|
56 |
-
accuracy 0.95 776
|
57 |
-
macro avg 0.96 0.93 0.94 776
|
58 |
-
weighted avg 0.95 0.95 0.95 776
|
59 |
-
|
60 |
Performance on HR:
|
61 |
Classification report for human vs model on event level:
|
62 |
precision recall f1-score support
|
63 |
|
64 |
-
0 0.94 0.84 0.89 242
|
65 |
1 0.93 0.98 0.95 531
|
66 |
-
|
67 |
-
accuracy 0.93 773
|
68 |
-
macro avg 0.93 0.91 0.92 773
|
69 |
-
weighted avg 0.93 0.93 0.93 773
|
70 |
```
|
71 |
The metrics reported are on event level, which means that if true and
|
72 |
predicted filled pauses at least partially overlap, we count them as a
|
|
|
28 |
|
29 |
## Evaluation on ROG corpus
|
30 |
|
31 |
+
In evaluation, we only evaluate positive events, i.e.
|
|
|
32 |
```
|
33 |
precision recall f1-score support
|
34 |
|
|
|
35 |
1 0.907 0.987 0.946 1834
|
|
|
|
|
|
|
|
|
36 |
```
|
37 |
|
38 |
## Evaluation on ParlaSpeech [HR](https://huggingface.co/datasets/classla/ParlaSpeech-HR) and [RS](https://huggingface.co/datasets/classla/ParlaSpeech-RS) corpora
|
|
|
44 |
Classification report for human vs model on event level:
|
45 |
precision recall f1-score support
|
46 |
|
|
|
47 |
1 0.95 0.99 0.97 542
|
|
|
|
|
|
|
|
|
|
|
48 |
Performance on HR:
|
49 |
Classification report for human vs model on event level:
|
50 |
precision recall f1-score support
|
51 |
|
|
|
52 |
1 0.93 0.98 0.95 531
|
|
|
|
|
|
|
|
|
53 |
```
|
54 |
The metrics reported are on event level, which means that if true and
|
55 |
predicted filled pauses at least partially overlap, we count them as a
|