File size: 3,243 Bytes
3cbadb9 a092095 71f9b03 a092095 71f9b03 a092095 aa6de74 71f9b03 aa6de74 71f9b03 aa6de74 71f9b03 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
---
metrics:
- wer
- cer
library_name: transformers
pipeline_tag: automatic-speech-recognition
tags:
- Pomak
- Slavic
---
# wav2vec2-xls-r-slavic-pomak
Pomak is an endangered South East Slavic language variety spoken in Nothern Greece.
This is the first automatic speech recognition (ASR) model for Pomak.
To train the model, we fine-tuned a Slavic model ([classla/wav2vec2-large-slavic-parlaspeech-hr](https://huggingface.co/classla/wav2vec2-large-slavic-parlaspeech-hr)) on 11h of recorded Pomak speech.
## Recordings
Four native Pomak speakers (2 female and 2 male) agreed to read Pomak texts at the ILSP audio-visual studio in Xanthi, Greece, resulting in a corpus of 14h.
|Speaker|Gender|Total recorded hours|
|---|---|---|
|NK9dIF | F | 4h 44m 45s|
|xoVY9q | M | 4h 36m 12s|
|9G75fk | F | 1h 44m 03s|
|n5WzHj | M | 3h 44m 04s|
To fine-tune the model, we split the long recordings into smaller segments of a maximum of 25 seconds each.
This removed the majority of pauses and resulted in a total dataset duration of 11h 8m.
## Metrics
We evaluated the model on the test set split, which consists of 10% of the dataset recordings.
|Model|CER|WER|
|---|---|---|
|pre-trained|87.31%|31.47%|
|fine-tuned|9.06%|3.12%|
## Training hyperparameters
We fine-tuned the baseline model (`wav2vec2-large-slavic-parlaspeech-hr`) on an NVIDIA GeForce RTX 3090, using the following hyperparameters:
| arg | value |
|-------------------------------|-------|
| `per_device_train_batch_size` | 8 |
| `gradient_accumulation_steps` | 2 |
| `num_train_epochs` | 35 |
| `learning_rate` | 3e-4 |
| `warmup_steps` | 500 |
## Citation
To cite this work or read more about the training pipeline, see [this paper](https://aclanthology.org/2023.fieldmatters-1.5/)
```
@inproceedings{tsoukala-etal-2023-asr,
title = "{ASR} pipeline for low-resourced languages: A case study on Pomak",
author = "Tsoukala, Chara and
Kritsis, Kosmas and
Douros, Ioannis and
Katsamanis, Athanasios and
Kokkas, Nikolaos and
Arampatzakis, Vasileios and
Sevetlidis, Vasileios and
Markantonatou, Stella and
Pavlidis, George",
booktitle = "Proceedings of the Second Workshop on NLP Applications to Field Linguistics",
month = may,
year = "2023",
address = "Dubrovnik, Croatia",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.fieldmatters-1.5",
doi = "10.18653/v1/2023.fieldmatters-1.5",
pages = "40--45",
abstract = "Automatic Speech Recognition (ASR) models can aid field linguists by facilitating the creation of text corpora from oral material. Training ASR systems for low-resource languages can be a challenging task not only due to lack of resources but also due to the work required for the preparation of a training dataset. We present a pipeline for data processing and ASR model training for low-resourced languages, based on the language family. As a case study, we collected recordings of Pomak, an endangered South East Slavic language variety spoken in Greece. Using the proposed pipeline, we trained the first Pomak ASR model.",
}
``` |