Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,77 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
metrics:
|
3 |
+
- wer
|
4 |
+
- cer
|
5 |
+
library_name: transformers
|
6 |
+
pipeline_tag: automatic-speech-recognition
|
7 |
+
tags:
|
8 |
+
- Aivaliot
|
9 |
+
- Greek dialect
|
10 |
+
---
|
11 |
+
|
12 |
+
# xls-r-greek-aivaliot
|
13 |
+
|
14 |
+
Aivaliot is a variety of Greek that was spoken in Aivali (known as Ayvalık in Turkish),
|
15 |
+
located on the Edremit Gulf in Western Turkey, till the beginning of the 20th century.
|
16 |
+
After the end of the war between Greece and Turkey (1919–1922) and the defeat of the Greek army,
|
17 |
+
those Aivaliots who managed to survive flew to Greece, principally to the nearby island of Lesbos,
|
18 |
+
where they settled in various dialectal enclaves. Aivaliot resembles Lesbian in many respects.
|
19 |
+
According to Ralli (Ralli, 2019), Aivaliot and Lesbian belong to the group of Northern Greek Dialects,
|
20 |
+
sharing unstressed /i/ and /u/ deletion and unstressed /o/ and /e/ raising.
|
21 |
+
Aivaliot morphology and the lexicon are influenced by Turkish, because of a long domination
|
22 |
+
by the Ottomans, as well as by Italo-Romance, due to the pre-Ottoman Genovese rule and trade with Venice (Ralli, 2019b).
|
23 |
+
However, there are no Turkish or Italo-Romance influences on phonology or syntax.
|
24 |
+
In 2002, a handful of first-generation Aivaliot speakers could still be found in Lesbos and
|
25 |
+
elsewhere in Greece and abroad, where they still remembered and practiced their mother tongue (Ralli, 2019).
|
26 |
+
Nowadays, the dialect is on the way to extinction, since second-generation speakers either have
|
27 |
+
a passive knowledge of it, or those living in Lesbos mix their own dialectal variety with the parent Lesbian.
|
28 |
+
|
29 |
+
This is the first automatic speech recognition (ASR) model for Aivaliot.
|
30 |
+
To train the model, we fine-tuned a Greek XLS-R model ([jonatasgrosman/wav2vec2-large-xlsr-53-greek](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-greek)) on 11h of recorded Pomak speech.
|
31 |
+
|
32 |
+
## Resources
|
33 |
+
|
34 |
+
To train the model, we used recordings from the Asia Minor Archive (AMiGre). AMiGre was compiled within the
|
35 |
+
framework of two research projects that ran in the periods 2002-2005 and 2012-2016.
|
36 |
+
We obtained permission to use it from the studies’ authors. It consists of narratives elicited from
|
37 |
+
18 elderly speakers (5 male, 13 female), all refugees from Aivali, who had settled in different villages
|
38 |
+
of the island of Lesbos. The data collection was carried out in 2002-2003, after obtaining a written
|
39 |
+
consent of the informants, as well as the approval of the Ethics committee of the University of Patras.
|
40 |
+
The corpus has a total duration of almost 14 hours. It has been transcribed and annotated by
|
41 |
+
two native speakers of the dialect, using a transcription system based on the Greek alphabet
|
42 |
+
and orthography, which is adapted according to SAMPA. The annotations include metadata information,
|
43 |
+
such as the source of the data, the identity and background of the informants, and the conditions of
|
44 |
+
the data collection. The corpus is stored on the server of the Laboratory of Modern Greek Dialects of
|
45 |
+
the University of Patras and is [freely accessible online](http://amigredb.philology.upatras.gr)
|
46 |
+
|
47 |
+
To prepare the dataset, the texts were normalized (see [greek_dialects_asr/](https://gitlab.com/ilsp-spmd-all/speech/greek_dialects_asr/) for scripts),
|
48 |
+
and all audio files were converted into a 16 kHz mono format.
|
49 |
+
We split the Praat annotations into audio-transcription segments, which resulted in a dataset of a total duration of 10h 14m 44s.
|
50 |
+
Note that the removal of music, long pauses, and non-transcribed segments leads to a reduction of the total audio duration (compared to the initial 14h recordings).
|
51 |
+
|
52 |
+
## Metrics
|
53 |
+
|
54 |
+
We evaluated the model on the test set split, which consists of 10% of the dataset recordings.
|
55 |
+
|
56 |
+
|Model|CER|WER|
|
57 |
+
|---|---|---|
|
58 |
+
|pre-trained|104.80%|113.67%|
|
59 |
+
|fine-tuned|39.55%|73.83%|
|
60 |
+
|
61 |
+
## Training hyperparameters
|
62 |
+
|
63 |
+
We fine-tuned the baseline model (`wav2vec2-large-xlsr-53-greek`) on an NVIDIA GeForce RTX 3090, using the following hyperparameters:
|
64 |
+
|
65 |
+
| arg | value |
|
66 |
+
|-------------------------------|-------|
|
67 |
+
| `per_device_train_batch_size` | 8 |
|
68 |
+
| `gradient_accumulation_steps` | 2 |
|
69 |
+
| `num_train_epochs` | 35 |
|
70 |
+
| `learning_rate` | 3e-4 |
|
71 |
+
| `warmup_steps` | 500 |
|
72 |
+
|
73 |
+
## Citation
|
74 |
+
|
75 |
+
To cite this work or read more about the training pipeline, see:
|
76 |
+
|
77 |
+
S. Vakirtzian, C. Tsoukala, S. Bompolas, K. Mouzou, V. Stamou, G. Paraskevopoulos, A. Dimakis, S. Markantonatou, A. Ralli, A. Anastasopoulos, Speech Recognition for Greek Dialects: A Challenging Benchmark, Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2024.
|