Update README.md
Browse files
README.md
CHANGED
@@ -4,6 +4,17 @@ language:
|
|
4 |
- en
|
5 |
- hi
|
6 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
```
|
8 |
import transformers
|
9 |
import librosa
|
@@ -21,4 +32,6 @@ turns = [
|
|
21 |
]
|
22 |
|
23 |
pipe({'audio': audio, 'turns': turns, 'sampling_rate': sr}, max_new_tokens=512)
|
24 |
-
```
|
|
|
|
|
|
4 |
- en
|
5 |
- hi
|
6 |
---
|
7 |
+
|
8 |
+
`Shuka v1` is a language model which natively understands audio in Indic languages. It is an encoder-decoder model built by combining two models:
|
9 |
+
- Our state-of-the-art, in-house, audio encoder: Saaras v1
|
10 |
+
- Meta’s Llama3-8B-Instruct as the decoder
|
11 |
+
|
12 |
+
The encoder and decoder are connected by a small projector with ~60M parameters. During training, only the projector weights are finetuned while the rest of the network is frozen. Following our tradition of training models frugally, we train `Shuka v1` on less than 100 hours of audio.
|
13 |
+
|
14 |
+
Though we only finetune the projector on English and Hindi data, the multilingual nature of our encoder makes `Shuka v1` perform well on zero-shot QA in other Indic languages as well. We have tested on the model on Bengali, English, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu.
|
15 |
+
|
16 |
+
You can get started by using huggingface pipeline, as follows:
|
17 |
+
|
18 |
```
|
19 |
import transformers
|
20 |
import librosa
|
|
|
32 |
]
|
33 |
|
34 |
pipe({'audio': audio, 'turns': turns, 'sampling_rate': sr}, max_new_tokens=512)
|
35 |
+
```
|
36 |
+
|
37 |
+
For more details, please see our blog (link coming soon).
|