Update README.md
Browse filesAdding a result section and adjusting a bit the layout
README.md
CHANGED
@@ -13,7 +13,7 @@ This is a model to classify and identify the accent of a UK or Ireland speaker a
|
|
13 |
|
14 |
The model implements transfer learning feature extraction using [Yamnet](https://tfhub.dev/google/yamnet/1) model in order to train a model.
|
15 |
|
16 |
-
|
17 |
Yamnet is an audio event classifier trained on the AudioSet dataset to predict audio events from the AudioSet ontology. It is available on TensorFlow Hub.
|
18 |
Yamnet accepts a 1-D tensor of audio samples with a sample rate of 16 kHz.
|
19 |
As output, the model returns a 3-tuple:
|
@@ -23,7 +23,7 @@ As output, the model returns a 3-tuple:
|
|
23 |
|
24 |
We will use the embeddings, which are the features extracted from the audio samples, as the input to our dense model.
|
25 |
|
26 |
-
|
27 |
The dense model that we used consists of:
|
28 |
- An input layer which is embedding output of the Yamnet classifier.
|
29 |
- 4 dense hidden layers and 4 dropout layers
|
@@ -36,6 +36,19 @@ The dense model that we used consists of:
|
|
36 |
|
37 |
</details>
|
38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
## Dataset
|
40 |
|
41 |
The dataset used is the
|
@@ -48,6 +61,6 @@ native speakers of Southern England, Midlands, Northern England, Wales, Scotland
|
|
48 |
For more info, please refer to the above link or to the following paper:
|
49 |
[Open-source Multi-speaker Corpora of the English Accents in the British Isles](https://aclanthology.org/2020.lrec-1.804.pdf)
|
50 |
|
51 |
-
|
52 |
-
|
53 |
A demo is available in HuggingFace Spaces ...
|
|
|
13 |
|
14 |
The model implements transfer learning feature extraction using [Yamnet](https://tfhub.dev/google/yamnet/1) model in order to train a model.
|
15 |
|
16 |
+
### Yamnet Model
|
17 |
Yamnet is an audio event classifier trained on the AudioSet dataset to predict audio events from the AudioSet ontology. It is available on TensorFlow Hub.
|
18 |
Yamnet accepts a 1-D tensor of audio samples with a sample rate of 16 kHz.
|
19 |
As output, the model returns a 3-tuple:
|
|
|
23 |
|
24 |
We will use the embeddings, which are the features extracted from the audio samples, as the input to our dense model.
|
25 |
|
26 |
+
### Dense Model
|
27 |
The dense model that we used consists of:
|
28 |
- An input layer which is embedding output of the Yamnet classifier.
|
29 |
- 4 dense hidden layers and 4 dropout layers
|
|
|
36 |
|
37 |
</details>
|
38 |
|
39 |
+
### Results
|
40 |
+
The model achieved the following results:
|
41 |
+
|
42 |
+
Results | Training | Validation
|
43 |
+
-----------|-----------|------------
|
44 |
+
Accuracy | 55% | 51%
|
45 |
+
AUC | 0.9090 | 0.8911
|
46 |
+
d-prime | 1.887 | 1.743
|
47 |
+
|
48 |
+
And the confusion matrix for the validation set is:
|
49 |
+
![Model Image](./confusion_matrix.png)
|
50 |
+
|
51 |
+
---
|
52 |
## Dataset
|
53 |
|
54 |
The dataset used is the
|
|
|
61 |
For more info, please refer to the above link or to the following paper:
|
62 |
[Open-source Multi-speaker Corpora of the English Accents in the British Isles](https://aclanthology.org/2020.lrec-1.804.pdf)
|
63 |
|
64 |
+
---
|
65 |
+
## Demo
|
66 |
A demo is available in HuggingFace Spaces ...
|