ArTST-V2 (ASR task)

ArTST model finetuned for automatic speech recognition (speech-to-text) on QASR to improve dialectal generalization.

Model Description

  • Developed by: Speech Lab, MBZUAI
  • Model type: SpeechT5
  • Language: Arabic
  • Finetuned from: ArTST-v2 pretrained

How to Get Started with the Model

import soundfile as sf
from transformers import (
    SpeechT5Config,
    SpeechT5FeatureExtractor,
    SpeechT5ForSpeechToText,
    SpeechT5Processor,
    SpeechT5Tokenizer,
)


device = "cuda" if torch.cuda.is_available() else "CPU"

model_id = "mbzuai/artst-v2-asr"

tokenizer = SpeechT5Tokenizer.from_pretrained(model_id)
processor = SpeechT5Processor.from_pretrained(model_id , tokenizer=tokenizer)
model = SpeechT5ForSpeechToText.from_pretrained(model_id).to(device)

audio, sr = sf.read("audio.wav")

inputs = processor(audio=audio, sampling_rate=sr, return_tensors="pt")
predicted_ids = model.generate(**inputs.to(device), max_length=150)

transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])

Model Sources [optional]

Citation [optional]

BibTeX:

@misc{djanibekov2024dialectalcoveragegeneralizationarabic,
      title={Dialectal Coverage And Generalization in Arabic Speech Recognition}, 
      author={Amirbek Djanibekov and Hawau Olamide Toyin and Raghad Alshalan and Abdullah Alitr and Hanan Aldarmaki},
      year={2024},
      eprint={2411.05872},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2411.05872}, 
}

@inproceedings{toyin-etal-2023-artst,
    title = "{A}r{TST}: {A}rabic Text and Speech Transformer",
    author = "Toyin, Hawau  and
      Djanibekov, Amirbek  and
      Kulkarni, Ajinkya  and
      Aldarmaki, Hanan",
    booktitle = "Proceedings of ArabicNLP 2023",
    month = dec,
    year = "2023",
    address = "Singapore (Hybrid)",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.arabicnlp-1.5",
    doi = "10.18653/v1/2023.arabicnlp-1.5",
    pages = "41--51",
}
Downloads last month
239
Safetensors
Model size
155M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including MBZUAI/artst-v2-asr