Audio Classification
speechbrain
PyTorch
English
Emotion
Diarization
wavlm
yingzhi commited on
Commit
845a974
1 Parent(s): a2ce4ef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -34,12 +34,12 @@ For a better experience, we encourage you to learn more about [SpeechBrain](http
34
 
35
  | Release | EDER(%) |
36
  |:-------------:|:--------------:|
37
- | 19-10-21 | 29.7 (Avg: 30.2) |
38
 
39
 
40
  ## Pipeline description
41
 
42
- This system is composed of an wavlm model. It is a combination of convolutional and residual blocks. The task aimes to predict the correct emotion composants and their boundaries within an utterance. For now, the model was trained with audios that contain only 1 non-neutral emotion event.
43
 
44
  The system is trained with recordings sampled at 16kHz (single channel).
45
  The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *diarize_file* if needed.
@@ -70,14 +70,14 @@ classifier = foreign_class(
70
  diary = classifier.diarize_file("speechbrain/emotion-diarization-wavlm-large/example.wav")
71
  print(diary)
72
  ```
73
- The output will contain a dictionary of emotion composants and their boundaries.
74
 
75
  ### Inference on GPU
76
  To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
77
 
78
  ### Training
79
  The model was trained with SpeechBrain (aa018540).
80
- To train it from scratch follows these steps:
81
  1. Clone SpeechBrain:
82
  ```bash
83
  git clone https://github.com/speechbrain/speechbrain/
 
34
 
35
  | Release | EDER(%) |
36
  |:-------------:|:--------------:|
37
+ | 05-07-23 | 29.7 (Avg: 30.2) |
38
 
39
 
40
  ## Pipeline description
41
 
42
+ This system is composed of a wavlm encoder a downstream frame-wise classifier. The task aimes to predict the correct emotion components and their boundaries within an utterance. For now, the model was trained with audios that contain only 1 non-neutral emotion event.
43
 
44
  The system is trained with recordings sampled at 16kHz (single channel).
45
  The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *diarize_file* if needed.
 
70
  diary = classifier.diarize_file("speechbrain/emotion-diarization-wavlm-large/example.wav")
71
  print(diary)
72
  ```
73
+ The output will contain a dictionary of emotion components and their boundaries.
74
 
75
  ### Inference on GPU
76
  To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
77
 
78
  ### Training
79
  The model was trained with SpeechBrain (aa018540).
80
+ To train it from scratch follow these steps:
81
  1. Clone SpeechBrain:
82
  ```bash
83
  git clone https://github.com/speechbrain/speechbrain/