--- license: cc-by-4.0 language: - bn - hi - pa - mr - gu base_model: - parthiv11/stt_hi_conformer_ctc_large_v2 tags: - speech_recognition - entity_tagging - dialect_prediction - gender - age - intent library_name: nemo datasets: - WhissleAI/indicvoices_hi_tagged_transcripts - WhissleAI/indicvoices_pa_tagged_transcripts - WhissleAI/indicvoices_mr_tagged_transcripts --- # Indo-Aryan Speech Tagger - Conformer CTC Model This speech tagger performs transcription for 5 Indian Languages: Hindi, Punjabi, Marathi, Bengali and Gujarati. It annotates key entities, predicts speaker age, dialect and intent. ## Model Details - **Model Type**: NeMo ASR - **Architecture**: Conformer CTC - **Language**: Bengali, Hindi, Punjabi, Marathi, Gujarati - **Training Data**: AI4Bharat IndicVoices Bengali V1 and V2 dataset - **Task**: Speech Recognition with Entity Tagging ## Usage ```python import nemo.collections.asr as nemo_asr # Load model asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained('WhissleAI/speech-tagger_indo-aryan_ctc_meta') # Transcribe audio transcription = asr_model.transcribe(['path/to/audio.wav']) print(transcription[0]) ``` ## Model Training - Base model: Conformer CTC - Fine-tuned on AI4Bharat IndicVoices Marathi dataset - Optimized for real-time transcription ## License & Attribution Please cite AI4Bharat when using this model: https://indicvoices.ai4bharat.org/