Model Description
- Developed by: Neura company
- Funded by: Neura
- Model type: Whisper Base
- Language(s) (NLP): Persian
Model Architecture
Whisper is a Transformer based encoder-decoder model, also referred to as a sequence-to-sequence model. It is a pre-trained model for automatic speech recognition (ASR) and speech translation.
Uses
Check out the Google Colab demo to run NeuraSpeech ASR on a free-tier Google Colab instance:
make sure these packages are installed:
from IPython.display import Audio, display
display(Audio('persian_audio.mp3', rate = 32_000,autoplay=True))
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa
# load model and processor
processor = WhisperProcessor.from_pretrained("Neurai/NeuraSpeech_WhisperBase")
model = WhisperForConditionalGeneration.from_pretrained("Neurai/NeuraSpeech_WhisperBase")
forced_decoder_ids = processor.get_decoder_prompt_ids(language="fa", task="transcribe")
array, sample_rate = librosa.load('persian_audio.mp3')
sr = 16000
array = librosa.to_mono(array)
array = librosa.resample(array, orig_sr=sample_rate, target_sr=16000)
input_features = processor(array, sampling_rate=sr, return_tensors="pt").input_features
# generate token ids
predicted_ids = model.generate(input_features)
# decode token ids to text
transcription = processor.batch_decode(predicted_ids,)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)
trascribed text :
او خواهان آزاد کردن بردگان بود
More Information
Model Card Authors
Esmaeil Zahedi, Mohsen Yazdinejad
Model Card Contact
- Downloads last month
- 494
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.