Spaces:
Running
Running
Create app.py
#1
by
zainmushtaq54
- opened
app.py
CHANGED
@@ -1,22 +1,3 @@
|
|
1 |
-
'''
|
2 |
-
+----------------------+ +-------------------------+ +-------------------------------+ +-------------------------+
|
3 |
-
| Step 1: Set Up | | Step 2: Set Up Gradio | | Step 3: Speech-to-Text | | Step 4: Text-to-Speech |
|
4 |
-
| Environment | | Interface | | & Language Model Processing | | Output |
|
5 |
-
+----------------------+ +-------------------------+ +-------------------------------+ +-------------------------+
|
6 |
-
| | | | | | | |
|
7 |
-
| - Import Python | | - Define interface | | - Transcribe audio | | - XTTS model generates |
|
8 |
-
| libraries | | components | | to text using | | spoken response from |
|
9 |
-
| - Initialize models: |--------> - Configure audio and |------->| Faster Whisper ASR |------->| LLM's text response |
|
10 |
-
| Whisper, Mistral, | | text interaction | | - Transcribed text | | |
|
11 |
-
| XTTS | | - Launch interface | | is added to | | |
|
12 |
-
| | | | | chatbot's history | | |
|
13 |
-
| | | | | - Mistral LLM | | |
|
14 |
-
| | | | | processes chatbot | | |
|
15 |
-
| | | | | history to generate | | |
|
16 |
-
| | | | | response | | |
|
17 |
-
+----------------------+ +-------------------------+ +-------------------------------+ +-------------------------+
|
18 |
-
'''
|
19 |
-
|
20 |
###### Set Up Environment ######
|
21 |
|
22 |
import os
|
@@ -205,7 +186,6 @@ with gr.Blocks(title="Voice chat with LLM") as demo:
|
|
205 |
- Speech to Text Model: [Faster-Whisper-large-v3](https://huggingface.co/Systran/faster-whisper-large-v3) an ASR model, to transcribe recorded audio to text.
|
206 |
- Large Language Model: [Mistral-7b-instruct-v0.1-quantized](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF) a LLM to generate the chatbot responses.
|
207 |
- Text to Speech Model: [XTTS-v2](https://huggingface.co/spaces/coqui/xtts) a TTS model, to generate the voice of the chatbot.
|
208 |
-
|
209 |
Note:
|
210 |
- Responses generated by chat model should not be assumed correct or taken serious, as this is a demonstration example only
|
211 |
- iOS (Iphone/Ipad) devices may not experience voice due to autoplay being disabled on these devices by Vendor"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
###### Set Up Environment ######
|
2 |
|
3 |
import os
|
|
|
186 |
- Speech to Text Model: [Faster-Whisper-large-v3](https://huggingface.co/Systran/faster-whisper-large-v3) an ASR model, to transcribe recorded audio to text.
|
187 |
- Large Language Model: [Mistral-7b-instruct-v0.1-quantized](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF) a LLM to generate the chatbot responses.
|
188 |
- Text to Speech Model: [XTTS-v2](https://huggingface.co/spaces/coqui/xtts) a TTS model, to generate the voice of the chatbot.
|
|
|
189 |
Note:
|
190 |
- Responses generated by chat model should not be assumed correct or taken serious, as this is a demonstration example only
|
191 |
- iOS (Iphone/Ipad) devices may not experience voice due to autoplay being disabled on these devices by Vendor"""
|