alien79 commited on
Commit
6582a16
1 Parent(s): fe33f26

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -38,10 +38,10 @@ bnb_optimizer=false
38
 
39
  # Pre processing
40
  Data extracted from the datasource has been preprocessed in its transcription.
41
- From my understanding, punctuation is important because it's used to teach to have pauses and proper intonation so it has been preserved.
42
  Original italian "text" field was even containing direct dialogue escapes (long hyphen) that has also be preserved but it contained also
43
  a hypen that was used to split a word in a new line (I don't know which process was used on original dataset to create the text transcription)
44
- and so I removed that hypens merging the two part of the word, otherwise the training was done on artifacts that didn't impacted the speech.
45
  I'm only talking about Italian data on cml-tts, I don't know if other languages are affected by this too.
46
 
47
 
 
38
 
39
  # Pre processing
40
  Data extracted from the datasource has been preprocessed in its transcription.
41
+ From my understanding, punctuation is important because it's used to teach to have pauses and proper intonation so it has been preserved.
42
  Original italian "text" field was even containing direct dialogue escapes (long hyphen) that has also be preserved but it contained also
43
  a hypen that was used to split a word in a new line (I don't know which process was used on original dataset to create the text transcription)
44
+ and so I removed that hypens merging the two part of the word, otherwise the training was done on artifacts that didn't impacted the speech.
45
  I'm only talking about Italian data on cml-tts, I don't know if other languages are affected by this too.
46
 
47