Update README.md
Browse files
README.md
CHANGED
@@ -38,10 +38,10 @@ bnb_optimizer=false
|
|
38 |
|
39 |
# Pre processing
|
40 |
Data extracted from the datasource has been preprocessed in its transcription.
|
41 |
-
From my understanding, punctuation is important because it's used to teach to have pauses and proper intonation so it has been preserved.
|
42 |
Original italian "text" field was even containing direct dialogue escapes (long hyphen) that has also be preserved but it contained also
|
43 |
a hypen that was used to split a word in a new line (I don't know which process was used on original dataset to create the text transcription)
|
44 |
-
and so I removed that hypens merging the two part of the word, otherwise the training was done on artifacts that didn't impacted the speech.
|
45 |
I'm only talking about Italian data on cml-tts, I don't know if other languages are affected by this too.
|
46 |
|
47 |
|
|
|
38 |
|
39 |
# Pre processing
|
40 |
Data extracted from the datasource has been preprocessed in its transcription.
|
41 |
+
From my understanding, punctuation is important because it's used to teach to have pauses and proper intonation so it has been preserved.
|
42 |
Original italian "text" field was even containing direct dialogue escapes (long hyphen) that has also be preserved but it contained also
|
43 |
a hypen that was used to split a word in a new line (I don't know which process was used on original dataset to create the text transcription)
|
44 |
+
and so I removed that hypens merging the two part of the word, otherwise the training was done on artifacts that didn't impacted the speech.
|
45 |
I'm only talking about Italian data on cml-tts, I don't know if other languages are affected by this too.
|
46 |
|
47 |
|