5roop commited on
Commit
1cdf067
1 Parent(s): 0355aa3

Update README.md

Browse files

Focus evaluation data on positive class only.

Files changed (1) hide show
  1. README.md +1 -18
README.md CHANGED
@@ -28,17 +28,11 @@ events partially overlap, this is counted as a true positive.
28
 
29
  ## Evaluation on ROG corpus
30
 
31
- The train and test data were obtained by resegmenting ROG corpus and using only segments with filled pauses. As a result,
32
- no true negatives are present in the data and the behaviour of the negative class (i.e. no filled pause detected) is unpredictable.
33
  ```
34
  precision recall f1-score support
35
 
36
- 0 0.531 0.123 0.200 211
37
  1 0.907 0.987 0.946 1834
38
-
39
- accuracy 0.898 2045
40
- macro avg 0.719 0.555 0.573 2045
41
- weighted avg 0.868 0.898 0.869 2045
42
  ```
43
 
44
  ## Evaluation on ParlaSpeech [HR](https://huggingface.co/datasets/classla/ParlaSpeech-HR) and [RS](https://huggingface.co/datasets/classla/ParlaSpeech-RS) corpora
@@ -50,23 +44,12 @@ Performance on RS:
50
  Classification report for human vs model on event level:
51
  precision recall f1-score support
52
 
53
- 0 0.97 0.87 0.92 234
54
  1 0.95 0.99 0.97 542
55
-
56
- accuracy 0.95 776
57
- macro avg 0.96 0.93 0.94 776
58
- weighted avg 0.95 0.95 0.95 776
59
-
60
  Performance on HR:
61
  Classification report for human vs model on event level:
62
  precision recall f1-score support
63
 
64
- 0 0.94 0.84 0.89 242
65
  1 0.93 0.98 0.95 531
66
-
67
- accuracy 0.93 773
68
- macro avg 0.93 0.91 0.92 773
69
- weighted avg 0.93 0.93 0.93 773
70
  ```
71
  The metrics reported are on event level, which means that if true and
72
  predicted filled pauses at least partially overlap, we count them as a
 
28
 
29
  ## Evaluation on ROG corpus
30
 
31
+ In evaluation, we only evaluate positive events, i.e.
 
32
  ```
33
  precision recall f1-score support
34
 
 
35
  1 0.907 0.987 0.946 1834
 
 
 
 
36
  ```
37
 
38
  ## Evaluation on ParlaSpeech [HR](https://huggingface.co/datasets/classla/ParlaSpeech-HR) and [RS](https://huggingface.co/datasets/classla/ParlaSpeech-RS) corpora
 
44
  Classification report for human vs model on event level:
45
  precision recall f1-score support
46
 
 
47
  1 0.95 0.99 0.97 542
 
 
 
 
 
48
  Performance on HR:
49
  Classification report for human vs model on event level:
50
  precision recall f1-score support
51
 
 
52
  1 0.93 0.98 0.95 531
 
 
 
 
53
  ```
54
  The metrics reported are on event level, which means that if true and
55
  predicted filled pauses at least partially overlap, we count them as a