Update README.md
Browse files
README.md
CHANGED
@@ -101,6 +101,8 @@ model-index:
|
|
101 |
|
102 |
# SpeechLLM
|
103 |
|
|
|
|
|
104 |
SpeechLLM is a multi-modal LLM trained to predict the metadata of the speaker's turn in a conversation. speechllm-2B model is based on HubertX audio encoder and TinyLlama LLM. The model predicts the following:
|
105 |
1. **SpeechActivity** : if the audio signal contains speech (True/False)
|
106 |
2. **Transcript** : ASR transcript of the audio
|
|
|
101 |
|
102 |
# SpeechLLM
|
103 |
|
104 |
+
![](./speechllm.png)
|
105 |
+
|
106 |
SpeechLLM is a multi-modal LLM trained to predict the metadata of the speaker's turn in a conversation. speechllm-2B model is based on HubertX audio encoder and TinyLlama LLM. The model predicts the following:
|
107 |
1. **SpeechActivity** : if the audio signal contains speech (True/False)
|
108 |
2. **Transcript** : ASR transcript of the audio
|