jimregan
/

psst-partial-timit

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

jimregan commited on Apr 15, 2022

Commit

ed0edc4

•

1 Parent(s): f868fe2

update

Files changed (1) hide show

README.md +17 -1

README.md CHANGED Viewed

@@ -6,11 +6,21 @@ tags:
 - automatic-speech-recognition
 ---
 ## Augmented TIMIT subset
 | **Augmentation**                                 | **FER**   | **PER**    | **Git tag**                         |
 | :----------------------------------------------- | :-------- | :--------- | :---------------------------------- |
-| unaugmented                                      | 10\.2%    | 22\.5%     |                                     |
 | Gaussian noise                                   | 10\.0%    | 22\.1%     | gaussian                            |
 | Pitchshift                                       | 9\.6%     | 22\.9%     | pitchshift                          |
 | RIR                                              | **9\.6%** | **21\.8%** | rir                                 |
@@ -30,7 +40,13 @@ tags:
 ## LM experiments
 |                                | **n-gram** | **FER**    | **PER**    | **Tag** |
 | :----------------------------- | :--------- | :--------- | :--------- | :--------- |

 - automatic-speech-recognition
 ---
+This repository contains a number of experiments for the [PSST Challenge](https://psst.study/).
+As the test set is unavailable, all numbers are based on the validation set.
+The overall best performing model was based on
 ## Augmented TIMIT subset
+Using a subset of TIMIT that could map easily to the phoneset used by the PSST Challenge data (a list of IDs are in the repository), we experimented with augmenting the data to better match the PSST data.
+The best results were obtained using Room Impulse Response (tag: `rir`)
 | **Augmentation**                                 | **FER**   | **PER**    | **Git tag**                         |
 | :----------------------------------------------- | :-------- | :--------- | :---------------------------------- |
+| unaugmented                                      | 10\.2%    | 22\.5%     | huggingface-unaugmented |
 | Gaussian noise                                   | 10\.0%    | 22\.1%     | gaussian                            |
 | Pitchshift                                       | 9\.6%     | 22\.9%     | pitchshift                          |
 | RIR                                              | **9\.6%** | **21\.8%** | rir                                 |
 ## LM experiments
+We experimented with a number of language model configurations, combining the data from the PSST challenge, the subset of TIMIT we used, and CMUdict.
+We tried combining CMUdict data in a number of ways: unmodified, with a silence token added at the start of the pronunciation, at the end, and at both the start and the end.
+The best result was from a 5-gram model, with silences added at the end of the CMUdict data.
+Evaluation was performed using scripts provided by the PSST Challenge's organisers, so there are no scripts in place to automatically use the LM with the transformers library.
 |                                | **n-gram** | **FER**    | **PER**    | **Tag** |
 | :----------------------------- | :--------- | :--------- | :--------- | :--------- |