leafspark
/

mini-omni-safetensors

Model card Files Files and versions Community

mini-omni-safetensors / README.md

leafspark's picture

Add model card

0d87d25 verified 5 months ago

|

history blame contribute delete

1.18 kB

	---
	license: mit
	language:
	- en
	base_model: Qwen/Qwen2-0.5B
	---


	<p align="center"><strong style="font-size: 18px;">
	Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
	</strong>
	</p>

	<p align="center">
	🤗 <a href="">Hugging Face</a> \| 📖 <a href="https://github.com/gpt-omni/mini-omni">Github</a>
	\| 📑 <a href="https://arxiv.org/abs/2408.16725">Technical report</a>
	</p>

	This is a safetensors conversion of `gpt-omni/mini-omni`.

	Mini-Omni is an open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

	<p align="center">
	<img src="frameworkv3.jpg" width="100%"/>
	</p>


	## Features

	✅ Real-time speech-to-speech conversational capabilities. No extra ASR or TTS models required.

	✅ Talking while thinking, with the ability to generate text and audio at the same time.

	✅ Streaming audio outupt capabilities.

	✅ With "Audio-to-Text" and "Audio-to-Audio" batch inference to further boost the performance.

	NOTE: please refer to https://github.com/gpt-omni/mini-omni for more details.