jimregan commited on
Commit
ed0edc4
1 Parent(s): f868fe2
Files changed (1) hide show
  1. README.md +17 -1
README.md CHANGED
@@ -6,11 +6,21 @@ tags:
6
  - automatic-speech-recognition
7
  ---
8
 
 
 
 
 
 
 
9
  ## Augmented TIMIT subset
10
 
 
 
 
 
11
  | **Augmentation** | **FER** | **PER** | **Git tag** |
12
  | :----------------------------------------------- | :-------- | :--------- | :---------------------------------- |
13
- | unaugmented | 10\.2% | 22\.5% | |
14
  | Gaussian noise | 10\.0% | 22\.1% | gaussian |
15
  | Pitchshift | 9\.6% | 22\.9% | pitchshift |
16
  | RIR | **9\.6%** | **21\.8%** | rir |
@@ -30,7 +40,13 @@ tags:
30
 
31
  ## LM experiments
32
 
 
 
 
 
 
33
 
 
34
 
35
  | | **n-gram** | **FER** | **PER** | **Tag** |
36
  | :----------------------------- | :--------- | :--------- | :--------- | :--------- |
 
6
  - automatic-speech-recognition
7
  ---
8
 
9
+ This repository contains a number of experiments for the [PSST Challenge](https://psst.study/).
10
+
11
+ As the test set is unavailable, all numbers are based on the validation set.
12
+
13
+ The overall best performing model was based on
14
+
15
  ## Augmented TIMIT subset
16
 
17
+ Using a subset of TIMIT that could map easily to the phoneset used by the PSST Challenge data (a list of IDs are in the repository), we experimented with augmenting the data to better match the PSST data.
18
+
19
+ The best results were obtained using Room Impulse Response (tag: `rir`)
20
+
21
  | **Augmentation** | **FER** | **PER** | **Git tag** |
22
  | :----------------------------------------------- | :-------- | :--------- | :---------------------------------- |
23
+ | unaugmented | 10\.2% | 22\.5% | huggingface-unaugmented |
24
  | Gaussian noise | 10\.0% | 22\.1% | gaussian |
25
  | Pitchshift | 9\.6% | 22\.9% | pitchshift |
26
  | RIR | **9\.6%** | **21\.8%** | rir |
 
40
 
41
  ## LM experiments
42
 
43
+ We experimented with a number of language model configurations, combining the data from the PSST challenge, the subset of TIMIT we used, and CMUdict.
44
+
45
+ We tried combining CMUdict data in a number of ways: unmodified, with a silence token added at the start of the pronunciation, at the end, and at both the start and the end.
46
+
47
+ The best result was from a 5-gram model, with silences added at the end of the CMUdict data.
48
 
49
+ Evaluation was performed using scripts provided by the PSST Challenge's organisers, so there are no scripts in place to automatically use the LM with the transformers library.
50
 
51
  | | **n-gram** | **FER** | **PER** | **Tag** |
52
  | :----------------------------- | :--------- | :--------- | :--------- | :--------- |