## DeSTA2 [📑 Paper](https://arxiv.org/pdf/2409.20007) | [🌐 Website](https://kehanlu.github.io/DeSTA2/) | [👩‍💻 Github](https://github.com/kehanlu/DeSTA2) | [🤗 Model](https://huggingface.co/DeSTA-ntu/DeSTA2-8B-beta) | [🤗 Dataset](https://huggingface.co/datasets/DeSTA-ntu/DeSTA2-Llama3-8B-Instruct) | ## Quickstart ```python from huggingface import AutoModel HF_TOKEN = "hf_..." # your huggingface token for downloading Llama3 from official Meta repo model = AutoModel.from_pretrained("DeSTA-ntu/DeSTA2-8B-beta", trust_remote_code=True, token=HF_TOKEN) messages = [ {"role": "system", "content": "You are a helpful voice assistant."}, {"role": "audio", "content": ""}, {"role": "user", "content": "Describe the audio."} ] generated_ids = model.chat( messages, max_new_tokens=128, do_sample=True, temperature=0.6, top_p=0.9 ) response = model.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(response) ``` ## Citation if you find our work useful, please consider citing the paper: ``` @article{lu2024developing, title={Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data}, author={Lu, Ke-Han and Chen, Zhehuai and Fu, Szu-Wei and Yang, Chao-Han Huck and Balam, Jagadeesh and Ginsburg, Boris and Wang, Yu-Chiang Frank and Lee, Hung-yi}, journal={arXiv preprint arXiv:2409.20007}, year={2024} } @inproceedings{lu24c_interspeech, title = {DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment}, author = {Ke-Han Lu and Zhehuai Chen and Szu-Wei Fu and He Huang and Boris Ginsburg and Yu-Chiang Frank Wang and Hung-yi Lee}, year = {2024}, booktitle = {Interspeech 2024}, pages = {4159--4163}, doi = {10.21437/Interspeech.2024-457}, issn = {2958-1796}, } ```