ModaVerse: Efficiently Transforming Modalities with LLMs
Paper
•
2401.06395
•
Published
•
3
Note Multimodal LLM, Using Agent binding to Generation Model for generation
Note Approach A didn't work well so the authors conclude that the speech tokens cannot be treated as a new language
Note Using HuBERT speech token directly in the LLM. Train a GAN vocoder (HiFi-GAN) for decoding.