@victor on Hugging Face: "Let's play a little game, how would you build the Rabbit R1 with open source…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

victor

posted an update Jan 12

Post

Let's play a little game, how would you build the Rabbit R1 with open source tech? Here is my stack:

- openai/whisper-small for awesome Speech-to-Text with low latency
- mistralai/Mixtral-8x7B-Instruct-v0.1 for an awesome super powerful LLM Brain
- coqui/XTTS-v2 for a nice and clean voice

Which stack will you personally choose?

chansung

Jan 12

Not exactly replicatable since the absence of LAM

victor

Jan 12

Not wrong, but you could probably build some nice web browsing agents on top of Mixtral 👀

osanseviero

Jan 12

I would also use a https://huggingface.co/microsoft/phi-2 model, we need a smaller model for quick inference for easy queries

chansung

Jan 12

I would use ph-2 on device for daily conversation and detecting users intention, then pass to cloud hosted much larger LLM to do more complicated stuff (such as web browsing and more).

sayakpaul

Jan 12

How would you aim for the cheapest latency using existing tooling?

radames

Jan 12

Are you thinking of running it on a device or in the cloud?

julien-c

Jan 12

i would do the LLM in the cloud and the ASR+TTS on device

wdyt @victor ?

thomwolf

Jan 12

or a https://huggingface.co/distil-whisper/distil-large-v2 for even faster speech-to-text

Tonic

Jan 13

facebook's Seamless M4T for the audio and voice to make multilingual , there's an ondevice model demo here: https://huggingface.co/spaces/Tonic/SeamlessOnDevice