sarvamai
/

shuka_v1

rahular commited on Aug 12

Commit

8c0524d

•

1 Parent(s): 11f413e

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -4,6 +4,17 @@ language:
 - en
 - hi
 ---
 ```
 import transformers
 import librosa
@@ -21,4 +32,6 @@ turns = [
         ]
 pipe({'audio': audio, 'turns': turns, 'sampling_rate': sr}, max_new_tokens=512)
-```

 - en
 - hi
 ---
+`Shuka v1` is a language model which natively understands audio in Indic languages. It is an encoder-decoder model built by combining two models:
+- Our state-of-the-art, in-house, audio encoder: Saaras v1
+- Meta’s Llama3-8B-Instruct as the decoder
+The encoder and decoder are connected by a small projector with ~60M parameters. During training, only the projector weights are finetuned while the rest of the network is frozen. Following our tradition of training models frugally, we train `Shuka v1` on less than 100 hours of audio.
+Though we only finetune the projector on English and Hindi data, the multilingual nature of our encoder makes `Shuka v1` perform well on zero-shot QA in other Indic languages as well. We have tested on the model on Bengali, English, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu.
+You can get started by using huggingface pipeline, as follows:
 ```
 import transformers
 import librosa
         ]
 pipe({'audio': audio, 'turns': turns, 'sampling_rate': sr}, max_new_tokens=512)
+```
+For more details, please see our blog (link coming soon).