GitMylo commited on
Commit
28dc103
1 Parent(s): dbd5e29

Add squeeze info

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -27,7 +27,7 @@ Voice cloning is creating a new voice for text-to-speech.
27
 
28
  Process:
29
  1. Load your wav audio file into your pytorch application
30
- 2. For the fine prompt extract [discrete representations](https://github.com/facebookresearch/encodec#extracting-discrete-representations). (These are used by bark to know about the voice)
31
  3. For the coarse prompt do `fine_prompt[:2, :]`, to get the coarse prompt from a fine prompt.
32
  4. For the semantics, load a HuBERT model without Kmeans (I personally use the [audiolm-pytorch](https://github.com/lucidrains/audiolm-pytorch) implementation's hubertwithkmeans, but i edited it to skip kmeans.)
33
  5. Next, to get the actual semantic tokens, run the tokens through this model. Your output will be compatible with bark.
 
27
 
28
  Process:
29
  1. Load your wav audio file into your pytorch application
30
+ 2. For the fine prompt extract [discrete representations](https://github.com/facebookresearch/encodec#extracting-discrete-representations). (These are used by bark to know about the voice), **make sure to `.squeeze()` the resulting codes.**
31
  3. For the coarse prompt do `fine_prompt[:2, :]`, to get the coarse prompt from a fine prompt.
32
  4. For the semantics, load a HuBERT model without Kmeans (I personally use the [audiolm-pytorch](https://github.com/lucidrains/audiolm-pytorch) implementation's hubertwithkmeans, but i edited it to skip kmeans.)
33
  5. Next, to get the actual semantic tokens, run the tokens through this model. Your output will be compatible with bark.