metadata

license: apache-2.0

UK & Ireland Accent Classification Model

This is a model to classify and identify the accent of a UK or Ireland speaker among one of the following accents:

Irish English
Midlands English
Northern English
Scottish English
Southern English
Welsh English

The model implements transfer learning feature extraction using Yamnet model in order to train a model.

Yamnet Model

Yamnet is an audio event classifier trained on the AudioSet dataset to predict audio events from the AudioSet ontology. It is available on TensorFlow Hub. Yamnet accepts a 1-D tensor of audio samples with a sample rate of 16 kHz.
As output, the model returns a 3-tuple:

Scores of shape (N, 521) representing the scores of the 521 classes.
Embeddings of shape (N, 1024).
The log-mel spectrogram of the entire audio frame.

We will use the embeddings, which are the features extracted from the audio samples, as the input to our dense model.

Dense Model

The dense model that we used consists of:

An input layer which is embedding output of the Yamnet classifier.
4 dense hidden layers and 4 dropout layers
An output dense layer.

View Model Plot

Results

The model achieved the following results:

Results	Training	Validation
Accuracy	55%	51%
AUC	0.9090	0.8911
d-prime	1.887	1.743

And the confusion matrix for the validation set is:

Dataset

The dataset used is the Crowdsourced high-quality UK and Ireland English Dialect speech data set which consists of a total of 17,877 high-quality audio wav files.

This dataset includes over 31 hours of recording from 120 vounteers who self-identify as native speakers of Southern England, Midlands, Northern England, Wales, Scotland and Ireland.

For more info, please refer to the above link or to the following paper: Open-source Multi-speaker Corpora of the English Accents in the British Isles

Demo

A demo is available in HuggingFace Spaces ...