|
--- |
|
license: mit |
|
language: |
|
- en |
|
base_model: Qwen/Qwen2-0.5B |
|
--- |
|
|
|
|
|
<p align="center"><strong style="font-size: 18px;"> |
|
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming |
|
</strong> |
|
</p> |
|
|
|
<p align="center"> |
|
π€ <a href="">Hugging Face</a> | π <a href="https://github.com/gpt-omni/mini-omni">Github</a> |
|
| π <a href="https://arxiv.org/abs/2408.16725">Technical report</a> |
|
</p> |
|
|
|
**This is a safetensors conversion of `gpt-omni/mini-omni`.** |
|
|
|
Mini-Omni is an open-source multimodel large language model that can **hear, talk while thinking**. Featuring real-time end-to-end speech input and **streaming audio output** conversational capabilities. |
|
|
|
<p align="center"> |
|
<img src="frameworkv3.jpg" width="100%"/> |
|
</p> |
|
|
|
|
|
## Features |
|
|
|
β
**Real-time speech-to-speech** conversational capabilities. No extra ASR or TTS models required. |
|
|
|
β
**Talking while thinking**, with the ability to generate text and audio at the same time. |
|
|
|
β
**Streaming audio outupt** capabilities. |
|
|
|
β
With "Audio-to-Text" and "Audio-to-Audio" **batch inference** to further boost the performance. |
|
|
|
**NOTE**: please refer to https://github.com/gpt-omni/mini-omni for more details. |