view post Post 5441 Multimodal Ichigo Llama 3.1 - Real Time Voice AI ๐ฅ> WhisperSpeech X Llama 3.1 8B> Trained on 50K hours of speech (7 languages)> Continually trained on 45hrs 10x A1000s> MLS -> WhisperVQ tokens -> Llama 3.1> Instruction tuned on 1.89M samples> 70% speech, 20% transcription, 10% text> Apache 2.0 licensed โกArchitecture:> WhisperSpeech/ VQ for Semantic Tokens> Llama 3.1 8B Instruct for Text backbone> Early fusion (Chameleon)I'm super bullish on HomeBrew/ Jan and early fusion, audio and text, multimodal models!(P.S. Play with the demo on Hugging Face: jan-hq/Ichigo-llama3.1-s-instruct) ๐ฅ 16 16 ๐ 5 5 โค๏ธ 2 2 ๐ 1 1 ๐ 1 1 ๐ 1 1 + Reply
view post Post 2497 This is the week of small AI language models! 4 replies ยท ๐ 10 10 ๐ค 9 9 ๐ฅ 2 2 + Reply