Update README.md
Browse files
README.md
CHANGED
@@ -53,4 +53,39 @@ model-index:
|
|
53 |
- name: Test WER
|
54 |
type: wer
|
55 |
value: 25.01
|
56 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
- name: Test WER
|
54 |
type: wer
|
55 |
value: 25.01
|
56 |
+
---
|
57 |
+
|
58 |
+
# SpeechLLM
|
59 |
+
|
60 |
+
## Usage
|
61 |
+
```python
|
62 |
+
# Load model directly from huggingface
|
63 |
+
from transformers import AutoModel
|
64 |
+
model = AutoModel.from_pretrained("skit-ai/SpeechLLM", trust_remote_code=True)
|
65 |
+
|
66 |
+
model.generate_meta(
|
67 |
+
audio_path="path-to-audio.wav",
|
68 |
+
instruction="Give me the following information about the audio [SpeechActivity, Transcript, Gender, Emotion, Age, Accent]",
|
69 |
+
max_new_tokens=500,
|
70 |
+
return_special_tokens=False
|
71 |
+
)
|
72 |
+
|
73 |
+
# Model Generation
|
74 |
+
'''
|
75 |
+
{ "SpeechActivity" : "True",
|
76 |
+
"Transcript": "Yes, I got it. I'll make the payment now.",
|
77 |
+
"Gender": "Female",
|
78 |
+
"Emotion": "Neutral",
|
79 |
+
"Age": "Young",
|
80 |
+
"Accent" : "America",
|
81 |
+
}
|
82 |
+
'''
|
83 |
+
```
|
84 |
+
|
85 |
+
## Checkpoint Result
|
86 |
+
|
87 |
+
| Dataset | Word Error Rate(%) | Gender(%) |
|
88 |
+
|:----------------------:|:------------------:|:---------:|
|
89 |
+
| librispeech-test-clean | 0.1230 | 0.8778 |
|
90 |
+
| librispeech-test-other | 0.1890 | 0.8908 |
|
91 |
+
| CommonVoice test | 0.2501 | 0.8753 |
|