Update README.md
Browse files
README.md
CHANGED
@@ -58,12 +58,12 @@ model-index:
|
|
58 |
# SpeechLLM
|
59 |
|
60 |
SpeechLLM is a multi-modal LLM trained to predict the metadata of the speaker's turn in a conversation. SpeechLLM model is based on HubertX acoustic encoder and TinyLlama LLM. The model predicts the following:
|
61 |
-
1.
|
62 |
-
2. ASR
|
63 |
-
3. Gender of the speaker
|
64 |
-
4. Age of the speaker
|
65 |
-
5. Accent of the speaker
|
66 |
-
6. Emotion of the speaker
|
67 |
|
68 |
## Usage
|
69 |
```python
|
|
|
58 |
# SpeechLLM
|
59 |
|
60 |
SpeechLLM is a multi-modal LLM trained to predict the metadata of the speaker's turn in a conversation. SpeechLLM model is based on HubertX acoustic encoder and TinyLlama LLM. The model predicts the following:
|
61 |
+
1. **SpeechActivity** : if the audio signal contains speech (True/False)
|
62 |
+
2. **Transcript** : ASR transcript of the audio
|
63 |
+
3. **Gender** of the speaker (Female/Male)
|
64 |
+
4. **Age** of the speaker (Young/Middle-Age/Senior)
|
65 |
+
5. **Accent** of the speaker (Africa/America/Celtic/Europe/Oceania/South-Asia/South-East-Asia)
|
66 |
+
6. **Emotion** of the speaker (Happy/Sad/Anger/Neutral/Frustrated)
|
67 |
|
68 |
## Usage
|
69 |
```python
|