|
--- |
|
language: |
|
- no |
|
- nb |
|
--- |
|
|
|
Warmstarted from the "Chills" single-speaker male model (not available on HF as of right now), then trained for 25 (de facto 50) epochs. Batch size 16, learning rate (√2)e-3 for the first 15(?) epochs and (5√2)e-4 for the next 10. |
|
|
|
Dataset: [NST Norwegian Speech Synthesis](https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-15/) (CC0), augmented like so: |
|
1. Make a copy of the dataset. |
|
2. Join the two shortest clips of the copy with 100ms of silence between them, then replace them with the joined version. Repeat until the shortest clip is at least 6 seconds long. |
|
3. Shuffle the original together with the copy. |