using WhisperTokenizer to generate input for onnx inference
#118
by
SantoshHF
- opened
I have downloaded onnx model for whisper using optimum-cli export onnx --model openai/whisper-large-v3 whisper_onnx
This model has input [ batch_size, 128, 3000 ]. But the processor is giving [ batch_size, 80, 3000 ]
from transformers import AutoProcessor
from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
from datasets import load_dataset
processor = AutoProcessor.from_pretrained("optimum/whisper-tiny.en")
#import pdb; pdb.set_trace()
model = ORTModelForSpeechSeq2Seq.from_pretrained("optimum/whisper-tiny.en")
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
inputs = processor.feature_extractor(ds[0]["audio"]["array"], return_tensors="pt")
is there a way to pad or generate sequence length of 128