DeSTA-ntu
/

DeSTA2-8B-beta

Model card Files Files and versions Community

DeSTA2-8B-beta / README.md

kehanlu's picture

Upload folder using huggingface_hub

c306bb4 verified about 1 month ago

|

1.92 kB


	## DeSTA2

	[📑 Paper](https://arxiv.org/pdf/2409.20007) \| [🌐 Website](https://kehanlu.github.io/DeSTA2/) \| [👩‍💻 Github](https://github.com/kehanlu/DeSTA2) \| [🤗 Model](https://huggingface.co/DeSTA-ntu/DeSTA2-8B-beta) \| [🤗 Dataset](https://huggingface.co/datasets/DeSTA-ntu/DeSTA2-Llama3-8B-Instruct) \|


	## Quickstart

	```python

	from huggingface import AutoModel

	HF_TOKEN = "hf_..." # your huggingface token for downloading Llama3 from official Meta repo

	model = AutoModel.from_pretrained("DeSTA-ntu/DeSTA2-8B-beta", trust_remote_code=True, token=HF_TOKEN)

	messages = [
	{"role": "system", "content": "You are a helpful voice assistant."},
	{"role": "audio", "content": "<path_to_audio_file>"},
	{"role": "user", "content": "Describe the audio."}
	]

	generated_ids = model.chat(
	messages,
	max_new_tokens=128,
	do_sample=True,
	temperature=0.6,
	top_p=0.9
	)

	response = model.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(response)
	```


	## Citation

	if you find our work useful, please consider citing the paper:

	```
	@article{lu2024developing,
	title={Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data},
	author={Lu, Ke-Han and Chen, Zhehuai and Fu, Szu-Wei and Yang, Chao-Han Huck and Balam, Jagadeesh and Ginsburg, Boris and Wang, Yu-Chiang Frank and Lee, Hung-yi},
	journal={arXiv preprint arXiv:2409.20007},
	year={2024}
	}

	@inproceedings{lu24c_interspeech,
	title = {DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment},
	author = {Ke-Han Lu and Zhehuai Chen and Szu-Wei Fu and He Huang and Boris Ginsburg and Yu-Chiang Frank Wang and Hung-yi Lee},
	year = {2024},
	booktitle = {Interspeech 2024},
	pages = {4159--4163},
	doi = {10.21437/Interspeech.2024-457},
	issn = {2958-1796},
	}
	```