skit-ai
/

speechllm-2B

Feature Extraction

speech-language

Model card Files Files and versions Community

shangeth commited on Jun 4

Commit

1b27420

•

1 Parent(s): d9290bd

Update README.md

Files changed (1) hide show

README.md +36 -1

README.md CHANGED Viewed

@@ -53,4 +53,39 @@ model-index:
           - name: Test WER
             type: wer
             value: 25.01
----

           - name: Test WER
             type: wer
             value: 25.01
+---
+# SpeechLLM
+## Usage
+```python
+# Load model directly from huggingface
+from transformers import AutoModel
+model = AutoModel.from_pretrained("skit-ai/SpeechLLM", trust_remote_code=True)
+model.generate_meta(
+	audio_path="path-to-audio.wav",
+	instruction="Give me the following information about the audio [SpeechActivity, Transcript, Gender, Emotion, Age, Accent]",
+	max_new_tokens=500,
+	return_special_tokens=False
+)
+# Model Generation
+'''
+{ "SpeechActivity" : "True",
+  "Transcript": "Yes, I got it. I'll make the payment now.",
+  "Gender": "Female",
+  "Emotion": "Neutral",
+  "Age": "Young",
+	"Accent" : "America",
+	}
+'''
+```
+## Checkpoint Result
+|         Dataset        | Word Error Rate(%) | Gender(%) |
+|:----------------------:|:------------------:|:---------:|
+| librispeech-test-clean | 0.1230             | 0.8778    |
+| librispeech-test-other | 0.1890             | 0.8908    |
+| CommonVoice test       | 0.2501             | 0.8753    |