ctsoukala commited on
Commit
8f78411
·
verified ·
1 Parent(s): 6f943cb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -0
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ metrics:
3
+ - wer
4
+ - cer
5
+ library_name: transformers
6
+ pipeline_tag: automatic-speech-recognition
7
+ tags:
8
+ - Aivaliot
9
+ - Greek dialect
10
+ ---
11
+
12
+ # xls-r-greek-aivaliot
13
+
14
+ Aivaliot is a variety of Greek that was spoken in Aivali (known as Ayvalık in Turkish),
15
+ located on the Edremit Gulf in Western Turkey, till the beginning of the 20th century.
16
+ After the end of the war between Greece and Turkey (1919–1922) and the defeat of the Greek army,
17
+ those Aivaliots who managed to survive flew to Greece, principally to the nearby island of Lesbos,
18
+ where they settled in various dialectal enclaves. Aivaliot resembles Lesbian in many respects.
19
+ According to Ralli (Ralli, 2019), Aivaliot and Lesbian belong to the group of Northern Greek Dialects,
20
+ sharing unstressed /i/ and /u/ deletion and unstressed /o/ and /e/ raising.
21
+ Aivaliot morphology and the lexicon are influenced by Turkish, because of a long domination
22
+ by the Ottomans, as well as by Italo-Romance, due to the pre-Ottoman Genovese rule and trade with Venice (Ralli, 2019b).
23
+ However, there are no Turkish or Italo-Romance influences on phonology or syntax.
24
+ In 2002, a handful of first-generation Aivaliot speakers could still be found in Lesbos and
25
+ elsewhere in Greece and abroad, where they still remembered and practiced their mother tongue (Ralli, 2019).
26
+ Nowadays, the dialect is on the way to extinction, since second-generation speakers either have
27
+ a passive knowledge of it, or those living in Lesbos mix their own dialectal variety with the parent Lesbian.
28
+
29
+ This is the first automatic speech recognition (ASR) model for Aivaliot.
30
+ To train the model, we fine-tuned a Greek XLS-R model ([jonatasgrosman/wav2vec2-large-xlsr-53-greek](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-greek)) on 11h of recorded Pomak speech.
31
+
32
+ ## Resources
33
+
34
+ To train the model, we used recordings from the Asia Minor Archive (AMiGre). AMiGre was compiled within the
35
+ framework of two research projects that ran in the periods 2002-2005 and 2012-2016.
36
+ We obtained permission to use it from the studies’ authors. It consists of narratives elicited from
37
+ 18 elderly speakers (5 male, 13 female), all refugees from Aivali, who had settled in different villages
38
+ of the island of Lesbos. The data collection was carried out in 2002-2003, after obtaining a written
39
+ consent of the informants, as well as the approval of the Ethics committee of the University of Patras.
40
+ The corpus has a total duration of almost 14 hours. It has been transcribed and annotated by
41
+ two native speakers of the dialect, using a transcription system based on the Greek alphabet
42
+ and orthography, which is adapted according to SAMPA. The annotations include metadata information,
43
+ such as the source of the data, the identity and background of the informants, and the conditions of
44
+ the data collection. The corpus is stored on the server of the Laboratory of Modern Greek Dialects of
45
+ the University of Patras and is [freely accessible online](http://amigredb.philology.upatras.gr)
46
+
47
+ To prepare the dataset, the texts were normalized (see [greek_dialects_asr/](https://gitlab.com/ilsp-spmd-all/speech/greek_dialects_asr/) for scripts),
48
+ and all audio files were converted into a 16 kHz mono format.
49
+ We split the Praat annotations into audio-transcription segments, which resulted in a dataset of a total duration of 10h 14m 44s.
50
+ Note that the removal of music, long pauses, and non-transcribed segments leads to a reduction of the total audio duration (compared to the initial 14h recordings).
51
+
52
+ ## Metrics
53
+
54
+ We evaluated the model on the test set split, which consists of 10% of the dataset recordings.
55
+
56
+ |Model|CER|WER|
57
+ |---|---|---|
58
+ |pre-trained|104.80%|113.67%|
59
+ |fine-tuned|39.55%|73.83%|
60
+
61
+ ## Training hyperparameters
62
+
63
+ We fine-tuned the baseline model (`wav2vec2-large-xlsr-53-greek`) on an NVIDIA GeForce RTX 3090, using the following hyperparameters:
64
+
65
+ | arg | value |
66
+ |-------------------------------|-------|
67
+ | `per_device_train_batch_size` | 8 |
68
+ | `gradient_accumulation_steps` | 2 |
69
+ | `num_train_epochs` | 35 |
70
+ | `learning_rate` | 3e-4 |
71
+ | `warmup_steps` | 500 |
72
+
73
+ ## Citation
74
+
75
+ To cite this work or read more about the training pipeline, see:
76
+
77
+ S. Vakirtzian, C. Tsoukala, S. Bompolas, K. Mouzou, V. Stamou, G. Paraskevopoulos, A. Dimakis, S. Markantonatou, A. Ralli, A. Anastasopoulos, Speech Recognition for Greek Dialects: A Challenging Benchmark, Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2024.