QubitPi commited on
Commit
48ed62f
·
verified ·
0 Parent(s):

Everything in one commit

Browse files
Files changed (4) hide show
  1. .gitignore +1 -0
  2. README.md +49 -0
  3. ancient-greek-phonemes.txt +274 -0
  4. convert.py +13 -0
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ .idea/
README.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Ancient Greek Reader NLP
2
+ ========================
3
+
4
+ __A lack of audio content is a major hurdle to learning Ancient Greek__. So I decided to tackle this problem with NLP.
5
+
6
+ How Does It Work
7
+ ----------------
8
+
9
+ OpenAI will read Ancient Greek text with a modern Greek pronunciation. What's different about OpenAI from other
10
+ Text-to-Speech tools is that OpenAI is unaffected by the different accents and breathing marks in Ancient Greek. It will
11
+ simply read the Ancient Greek text in modern pronunciation with the accents in the right places. Studying Ancient Greeek
12
+ with modern pronunciation is simply not satisfactory for me, though, so I started messing around with the text to see if
13
+ I could get the pronunciation closer to Erasmian/Attic/whatever we want to call it. We can simply replace letters in the
14
+ Greek words with Latin letters to try and get what we want.
15
+
16
+ Here is an example sentence.
17
+
18
+ This is the original text, which OpenAI will read in Modern Greek with no problem:
19
+
20
+ > Σόλων ἦν συνετώτατος πάντων τῶν Ἀθηναίων, τὴν γὰρ σοφίαν αὐτοῦ οὐ μόνον οἱ πολῖται ἐθαύμαζον, ἀλλὰ καὶ οἱ ἂλλοι
21
+ > Ἓλληνες πάντες, πολλοὶ δὲ καὶ τῶν βαρβάρων.
22
+
23
+ And here is the same but with letters replaced to try and get OpenAI to read in an "Attic" pronunciation:
24
+
25
+ > sόλωn en sunetώtαtος πάntωn tón aθenáiωn, tén γáρ sοφίan autu u μόnon hoi πολítαi eθáuμαζon, aλλá kái hoi áλλοi
26
+ > Héλλeneς πάnteς, πολλói δé kái tón βαρβάρωn.
27
+
28
+ A huge list of [letter replacements](./ancient-greek-phonemes.txt) has been made to try and imitate Attic pronunciation
29
+ as closely as possible. The result is pretty solid and is close enough to be useful for creating audio files for texts
30
+ where we don't have any audio recordings.
31
+ [Checkout this "Attic" pronounciation example](https://qubitpi.github.io/ancient-greek-reader/)
32
+
33
+ How to Use It (WIP)
34
+ -------------------
35
+
36
+ Here is a __WIP__ instruction:
37
+
38
+ 1. Get an accurate Ancient Greek text and save it to a file named __original.txt__.
39
+ [Perseus Digital Library](https://www.perseus.tufts.edu/hopper/) is a great source.
40
+ 2. Run replacement:
41
+
42
+ - `python3 convert.py`
43
+
44
+ 3. Convert txt to epub. [Calibre](https://calibre-ebook.com/) works pretty well for this.
45
+ 4. Use https://github.com/p0n1/epub_to_audiobook to generate audio
46
+
47
+ > [!CAUTION]
48
+ >
49
+ > OpenAI [billing](https://platform.openai.com/settings/organization/billing/overview) will apply to the last step
ancient-greek-phonemes.txt ADDED
@@ -0,0 +1,274 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ κ->k
2
+ ε->e
3
+ ι->i
4
+ οῖ->ói
5
+ τ->t
6
+ σ->s
7
+ Τ->t
8
+ Σ->s
9
+ οὕ->hú
10
+ οῦ->u
11
+ αύ->áu
12
+ οἴ->ói
13
+ αν->an
14
+ ον->on
15
+ ν->n
16
+ εἱ->hei
17
+ ού->ú
18
+ οὖ->u
19
+ οὐ->u
20
+ αῖ->ai
21
+ αι->ai
22
+ αί->ái
23
+ αί->ái
24
+ αὶ->ái
25
+ αἱ->ai
26
+ αυ->au
27
+ αύ->áu
28
+ αὺ->áu
29
+ αῦ->áu
30
+ ει->e
31
+ εί->éi
32
+ εὶ->éi
33
+ εί->ey
34
+ ευ->eu
35
+ εύ->éu
36
+ εύ->éu
37
+ εὺ->éu
38
+ οι->oi
39
+ οί->ói
40
+ οί->ói
41
+ οὶ->ói
42
+ οἱ->hoi
43
+ ου->u
44
+ ού->ú
45
+ οὺ->ú
46
+ ηυ->eu
47
+ ηύ->óu
48
+ ηὺ->óu
49
+ υι->ui
50
+ υί->úi
51
+ υὶ->úi
52
+ υ->u
53
+ ἀ->a
54
+ ἁ->ha
55
+ ἂ->á
56
+ ἃ->há
57
+ ἄ->á
58
+ ἅ->há
59
+ ἆ->a
60
+ ἇ->ha
61
+ Ἀ->a
62
+ Ἁ->Ha
63
+ Ἂ->á
64
+ Ἃ->Ha
65
+ Ἄ->á
66
+ Ἅ->Ha
67
+ Ἆ->á
68
+ Ἇ->Ha
69
+ ἐ->e
70
+ ἑ->hey
71
+ ἒ->é
72
+ ἓ->hé
73
+ ἔ->é
74
+ ἕ->hé
75
+ Ἐ->é
76
+ Ἑ->he
77
+ Ἒ->é
78
+ Ἓ->Hé
79
+ Ἔ->é
80
+ Ἕ->Hé
81
+ ἠ->E
82
+ ἡ->he
83
+ ἢ->é
84
+ ἣ->hé
85
+ ἤ->é
86
+ ἥ->hé
87
+ ἦ->e
88
+ ἧ->hé
89
+ Ἠ->E
90
+ Ἡ->he
91
+ Ἢ->é
92
+ Ἣ->hé
93
+ Ἤ->é
94
+ Ἥ->hé
95
+ Ἦ->E
96
+ Ἧ->hé
97
+ ἰ->i
98
+ ἱ->hi
99
+ ἲ->í
100
+ ἳ->hí
101
+ ἴ->í
102
+ ἵ->hí
103
+ ἶ->i
104
+ ἷ->hí
105
+ Ἰ->i
106
+ Ἱ->Hi
107
+ Ἲ->í
108
+ Ἳ->Hí
109
+ Ἴ->í
110
+ Ἵ->Hí
111
+ Ἶ->i
112
+ Ἷ->Hí
113
+ ὀ->o
114
+ ὁ->ho
115
+ ὂ->ó
116
+ ὃ->ho
117
+ ὄ->ó
118
+ ὅ->ho
119
+ Ὀ->o
120
+ Ὁ->ho
121
+ Ὂ->ó
122
+ Ὃ->ho
123
+ Ὄ->ó
124
+ Ὅ->ho
125
+ ὐ->u
126
+ ὑ->hu
127
+ ὒ->ú
128
+ ὓ->hu
129
+ ὔ->ú
130
+ ὕ->hu
131
+ ὖ->u
132
+ ὗ->hu
133
+ Ὑ->Hu
134
+ Ὓ->ú
135
+ Ὕ->ú
136
+ Ὗ->Hu
137
+ ὠ->o
138
+ ὡ->ho
139
+ ὢ->ó
140
+ ὣ->hó
141
+ ὤ->ó
142
+ ὥ->hó
143
+ ὦ->o
144
+ ὧ->hó
145
+ Ὠ->o
146
+ Ὡ->ho
147
+ Ὢ->ó
148
+ Ὣ->ho
149
+ Ὤ->ó
150
+ Ὥ->hó
151
+ Ὦ->o
152
+ Ὧ->hó
153
+ ὰ->á
154
+ ά->á
155
+ ὲ->é
156
+ έ->é
157
+ ὴ->é
158
+ ή->é
159
+ ὶ->i
160
+ ί->í
161
+ ὸ->ó
162
+ ό->ó
163
+ ὺ->ú
164
+ ύ->ú
165
+ ὼ->ó
166
+ ώ->ó
167
+ ᾀ->ai
168
+ ᾁ->hai
169
+ ᾂ->ái
170
+ ᾃ->hái
171
+ ᾄ->ái
172
+ ᾅ->hái
173
+ ᾆ->ai
174
+ ᾇ->hái
175
+ ᾈ->Ai
176
+ ᾉ->hai
177
+ ᾊ->ái
178
+ ᾋ->hái
179
+ ᾌ->ái
180
+ ᾍ->hái
181
+ ᾎ->ái
182
+ ᾏ->hái
183
+ ᾐ->ei
184
+ ᾑ->hai
185
+ ᾒ->éi
186
+ ᾓ->hái
187
+ ᾔ->éi
188
+ ᾕ->hái
189
+ ᾖ->ei
190
+ ᾗ->hái
191
+ ᾘ->ei
192
+ ᾙ->he
193
+ ᾚ->éi
194
+ ᾛ->hé
195
+ ᾜ->éi
196
+ ᾝ->hé
197
+ ᾞ->ei
198
+ ᾟ->hé
199
+ ᾠ->oi
200
+ ᾡ->hoi
201
+ ᾢ->ói
202
+ ᾣ->hói
203
+ ᾤ->ói
204
+ ᾥ->hói
205
+ ᾦ->oi
206
+ ᾧ->hói
207
+ ᾨ->oi
208
+ ᾩ->hoi
209
+ ᾪ->oi
210
+ ᾫ->hói
211
+ ᾬ->ói
212
+ ᾭ->hói
213
+ ᾮ->ói
214
+ ᾯ->hói
215
+ ᾰ->a
216
+ ᾱ->a
217
+ ᾲ->ái
218
+ ᾳ->ai
219
+ ᾴ->ái
220
+ ᾶ->a
221
+ ᾷ->ái
222
+ Ᾰ->á
223
+ Ᾱ->A
224
+ Ὰ->A
225
+ Ά->Ha
226
+ ᾼ->Ai
227
+ ῂ->éy
228
+ ῃ->ey
229
+ ῄ->éy
230
+ ῆ->éy
231
+ ῇ->éy
232
+ Ὲ->He
233
+ Έ->E
234
+ Ὴ->Hay
235
+ Ή->Ei
236
+ ῌ->Ei
237
+ ῐ->í
238
+ ῑ->i
239
+ ῒ->í
240
+ ΐ->i
241
+ ῖ->í
242
+ ῗ->í
243
+ Ῐ->í
244
+ Ῑ->I
245
+ Ὶ->Hee
246
+ Ί->I
247
+ ῠ->U
248
+ ῡ->u
249
+ ῢ->ú
250
+ ΰ->ú
251
+ ῤ->r
252
+ ῥ->r
253
+ ῦ->ú
254
+ ῧ->ú
255
+ Ῠ->ú
256
+ Ῡ->U
257
+ Ὺ->U
258
+ Ύ->Hu
259
+ Ῥ->R
260
+ ῲ->ó
261
+ ῳ->oi
262
+ ῴ->ói
263
+ ῶ->ó
264
+ ῷ->ói
265
+ Ὸ->O
266
+ Ό->Ho
267
+ Ὼ->Ho
268
+ Ώ->O
269
+ ῼ->Oi
270
+ ή->é
271
+ η->e
272
+ αu->au
273
+ ύ->u
274
+ ηὐ->eu
convert.py ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ if __name__ == "__main__":
2
+ with open("ancient-greek-phonemes.txt", "r") as mapping_file:
3
+ mapping = dict(
4
+ [(ancient_greek, latin) for ancient_greek, latin in [line.rstrip().split("->") for line in mapping_file.readlines()]]
5
+ )
6
+ with open("original.txt", "r") as book_txt:
7
+ book = book_txt.read()
8
+
9
+ for key, value in mapping.items():
10
+ book = book.replace(key, value)
11
+
12
+ with open("converted.txt", "w") as book_txt:
13
+ book_txt.write(book)