ilsp
/

xls-r-greek-aivaliot

+---
+metrics:
+- wer
+- cer
+library_name: transformers
+pipeline_tag: automatic-speech-recognition
+tags:
+- Aivaliot
+- Greek dialect
+---
+# xls-r-greek-aivaliot
+Aivaliot is a variety of Greek that was spoken in Aivali (known as Ayvalık in Turkish),
+located on the Edremit Gulf in Western Turkey, till the beginning of the 20th century.
+After the end of the war between Greece and Turkey (1919–1922) and the defeat of the Greek army,
+those Aivaliots who managed to survive flew to Greece, principally to the nearby island of Lesbos,
+where they settled in various dialectal enclaves. Aivaliot resembles Lesbian in many respects.
+According to Ralli (Ralli, 2019), Aivaliot and Lesbian belong to the group of Northern Greek Dialects,
+sharing unstressed /i/ and /u/ deletion and unstressed /o/ and /e/ raising.
+Aivaliot morphology and the lexicon are influenced by Turkish, because of a long domination
+by the Ottomans, as well as by Italo-Romance, due to the pre-Ottoman Genovese rule and trade with Venice (Ralli, 2019b).
+However, there are no Turkish or Italo-Romance influences on phonology or syntax.
+In 2002, a handful of first-generation Aivaliot speakers could still be found in Lesbos and
+elsewhere in Greece and abroad, where they still remembered and practiced their mother tongue (Ralli, 2019).
+Nowadays, the dialect is on the way to extinction, since second-generation speakers either have
+a passive knowledge of it, or those living in Lesbos mix their own dialectal variety with the parent Lesbian.
+This is the first automatic speech recognition (ASR) model for Aivaliot.
+To train the model, we fine-tuned a Greek XLS-R model ([jonatasgrosman/wav2vec2-large-xlsr-53-greek](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-greek)) on 11h of recorded Pomak speech.
+## Resources
+To train the model, we used recordings from the Asia Minor Archive (AMiGre). AMiGre was compiled within the
+framework of two research projects that ran in the periods 2002-2005 and 2012-2016.
+We obtained permission to use it from the studies’ authors. It consists of narratives elicited from
+18 elderly speakers (5 male, 13 female), all refugees from Aivali, who had settled in different villages
+of the island of Lesbos. The data collection was carried out in 2002-2003, after obtaining a written
+consent of the informants, as well as the approval of the Ethics committee of the University of Patras.
+The corpus has a total duration of almost 14 hours. It has been transcribed and annotated by
+two native speakers of the dialect, using a transcription system based on the Greek alphabet
+and orthography, which is adapted according to SAMPA. The annotations include metadata information,
+such as the source of the data, the identity and background of the informants, and the conditions of
+the data collection. The corpus is stored on the server of the Laboratory of Modern Greek Dialects of
+the University of Patras and is [freely accessible online](http://amigredb.philology.upatras.gr)
+To prepare the dataset, the texts were normalized (see [greek_dialects_asr/](https://gitlab.com/ilsp-spmd-all/speech/greek_dialects_asr/) for scripts),
+and all audio files were converted into a 16 kHz mono format.
+We split the Praat annotations into audio-transcription segments, which resulted in a dataset of a total duration of 10h 14m 44s.
+Note that the removal of music, long pauses, and non-transcribed segments leads to a reduction of the total audio duration (compared to the initial 14h recordings).
+## Metrics
+We evaluated the model on the test set split, which consists of 10% of the dataset recordings.
+|Model|CER|WER|
+|---|---|---|
+|pre-trained|104.80%|113.67%|
+|fine-tuned|39.55%|73.83%|
+## Training hyperparameters
+We fine-tuned the baseline model (`wav2vec2-large-xlsr-53-greek`) on an NVIDIA GeForce RTX 3090, using the following hyperparameters:
+| arg                           | value |
+|-------------------------------|-------|
+| `per_device_train_batch_size` | 8     |
+| `gradient_accumulation_steps` | 2     |
+| `num_train_epochs`            | 35    |
+| `learning_rate`               | 3e-4  |
+| `warmup_steps`                | 500   |
+## Citation
+To cite this work or read more about the training pipeline, see:
+S. Vakirtzian, C. Tsoukala, S. Bompolas, K. Mouzou, V. Stamou, G. Paraskevopoulos, A. Dimakis, S. Markantonatou, A. Ralli, A. Anastasopoulos, Speech Recognition for Greek Dialects: A Challenging Benchmark, Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2024.