alibabasglab
/

AV_MossFormer2_TSE_16K

Model card Files Files and versions Community

AV_MossFormer2_TSE_16K / README.md

alibabasglab's picture

Update README.md

59fb133 verified 2 months ago

|

history blame contribute delete

359 Bytes

	---
	license: apache-2.0
	---

	The AV_MossFormer2_TSE_16K model weights for 16 kHz audio-visual target speaker extraction in [ClearerVoice-Studio](https://github.com/modelscope/ClearerVoice-Studio/tree/main) repo.

	This model is trained on large scale open-sourced datasets.

	It extracts each speaker's voice from a multi-speaker video using facial recognition.