Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
1
bitsoko
bitsoko
Follow
sid7's profile picture
razvanab's profile picture
JohnRoger's profile picture
3 followers
Β·
0 following
https://gumzo.bitsoko.org
bitsoko
bitsoko
AI & ML interests
None yet
Recent Activity
liked
a model
26 days ago
Goekdeniz-Guelmez/J.O.S.I.E.v4o
Reacted to
reach-vb
's
post
with π§
about 2 months ago
Less than two days ago Kyutai Labs open sourced Moshi - an ~7.6B on-device Speech to Speech foundation model and Mimi - SoTA streaming speech codec! π₯ The release includes: 1. Moshiko & Moshika - Moshi finetuned on synthetic data (CC-BY license) (https://huggingface.co/collections/kyutai/moshi-v01-release-66eaeaf3302bef6bd9ad7acd) 2. Mimi - Streaiming Audio Codec, processes 24 kHz audio, down to a 12.5 Hz representation with a bandwidth of 1.1 kbps (CC-BY license) (https://huggingface.co/kyutai/mimi) 3. Model checkpoints & Inference codebase written in Rust (Candle), PyTorch & MLX (Apache license) (https://github.com/kyutai-labs/moshi) How does Moshi work? 1. Moshi processes two audio streams: one for itself and one for the user, with the user's stream coming from audio input and Moshi's stream generated by the model. 2. Along with these audio streams, Moshi predicts text tokens for its speech, enhancing its generation quality. 3. The model uses a small Depth Transformer for codebook dependencies and a large 7B parameter Temporal Transformer for temporal dependencies. 4. The theoretical latency is 160ms, with a practical latency of around 200ms on an L4 GPU. Model size & inference: Moshiko/ka are 7.69B param models bf16 ~16GB VRAM 8-bit ~8GB VRAM 4-bit ~4GB VRAM You can run inference via Candle π¦, PyTorch and MLX - based on your hardware. The Kyutai team, @adefossez @lmz and team are cracked AF, they're bringing some serious firepower to the open source/ science AI scene, looking forward to what's next! π
Reacted to
reach-vb
's
post
with π₯
about 2 months ago
Less than two days ago Kyutai Labs open sourced Moshi - an ~7.6B on-device Speech to Speech foundation model and Mimi - SoTA streaming speech codec! π₯ The release includes: 1. Moshiko & Moshika - Moshi finetuned on synthetic data (CC-BY license) (https://huggingface.co/collections/kyutai/moshi-v01-release-66eaeaf3302bef6bd9ad7acd) 2. Mimi - Streaiming Audio Codec, processes 24 kHz audio, down to a 12.5 Hz representation with a bandwidth of 1.1 kbps (CC-BY license) (https://huggingface.co/kyutai/mimi) 3. Model checkpoints & Inference codebase written in Rust (Candle), PyTorch & MLX (Apache license) (https://github.com/kyutai-labs/moshi) How does Moshi work? 1. Moshi processes two audio streams: one for itself and one for the user, with the user's stream coming from audio input and Moshi's stream generated by the model. 2. Along with these audio streams, Moshi predicts text tokens for its speech, enhancing its generation quality. 3. The model uses a small Depth Transformer for codebook dependencies and a large 7B parameter Temporal Transformer for temporal dependencies. 4. The theoretical latency is 160ms, with a practical latency of around 200ms on an L4 GPU. Model size & inference: Moshiko/ka are 7.69B param models bf16 ~16GB VRAM 8-bit ~8GB VRAM 4-bit ~4GB VRAM You can run inference via Candle π¦, PyTorch and MLX - based on your hardware. The Kyutai team, @adefossez @lmz and team are cracked AF, they're bringing some serious firepower to the open source/ science AI scene, looking forward to what's next! π
View all activity
Organizations
None yet
spaces
1
No application file
π
Gumzo
models
12
Sort:Β Recently updated
bitsoko/gumzo
Updated
May 20
bitsoko/outputs
Updated
May 17
bitsoko/gumzo-gemma
Updated
Mar 21
bitsoko/gumzo-gemma-7b
Updated
Mar 15
bitsoko/gumzo-tiny
Updated
Mar 13
bitsoko/gumzo-mistral
Text Generation
β’
Updated
Feb 25
β’
2
bitsoko/gumzo-llama-00
Text Generation
β’
Updated
Feb 23
β’
29
bitsoko/gumzo-tiny-01
Updated
Feb 17
β’
16
bitsoko/gumzo-llama-01
Updated
Feb 15
bitsoko/gumzo-rpj
Updated
Feb 11
Expand 12 models
datasets
1
bitsoko/AfroNative
Viewer
β’
Updated
Feb 15
β’
130k
β’
35