Spaces:
Runtime error
Things that might be relevant
Trained models
ESPnet model for Yoloxochitl Mixtec
- Huggingface Hub page https://huggingface.co/espnet/ftshijt_espnet2_asr_yolo_mixtec_transformer
- Model source code https://github.com/espnet/espnet/tree/master/egs/yoloxochitl_mixtec/asr1
- Colab notebook to setup and apply the model https://colab.research.google.com/drive/1ieoW2b3ERydjaaWuhVPBP_v2QqqWsC1Q?usp=sharing
Coqui model for Yoloxochitl Mixtec
- Huggingface Hub page
- Coqui page https://coqui.ai/mixtec/jemeyer/v1.0.0
- Colab notebook to setup and apply the model https://colab.research.google.com/drive/1b1SujEGC_F3XhvUCuUyZK_tyUkEaFZ7D?usp=sharing#scrollTo=6IvRFke4Ckpz
Spanish ASR models
- XLS-R model based on CV8 with LM https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-spanish
- XLSR model based on CV6 with LM https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-spanish
- XLSR model based on Librispeech https://huggingface.co/IIC/wav2vec2-spanish-multilibrispeech
Speechbrain Language identification on Common Language (from Common Voice 6/7?)
- source code https://github.com/speechbrain/speechbrain/tree/develop/recipes/CommonLanguage
- HF Hub model page https://huggingface.co/speechbrain/lang-id-commonlanguage_ecapa
- HF Hub space https://huggingface.co/spaces/akhaliq/Speechbrain-audio-classification
Speechbrain Language identification on VoxLingua
- source code https://github.com/speechbrain/speechbrain/tree/develop/recipes/VoxLingua107/lang_id
- HF Hub model page https://huggingface.co/speechbrain/lang-id-voxlingua107-ecapa
Corpora
OpenSLR89 https://www.openslr.org/89/
Common Language https://huggingface.co/datasets/common_language
VoxLingua http://bark.phon.ioc.ee/voxlingua107/
Multilibrispeech https://huggingface.co/datasets/multilingual_librispeech
Possible demos
Simple categorization of utterances
A few example files are provided for each language, and the user can record their own. The predicted confidence of each class label is shown.
Segmentation and identification
Recordings with alternating languages in a single audio file, provided examples or the user can record. Some voice activity detection to split the audio, then predict language of each piece
Identication and transcription
Example files for each language separately. The lang-id model predicts what language it is. The corresponding ASR model produces a transcript.
Segmentation, identification and transcription
Recordings with alternating languages in a single audio file. Use voice activity detection to split the audio, then predict the language of each piece Use the corresponding ASR model to produce a transcript of each piece to display.