using WhisperTokenizer to generate input for onnx inference

#118
by SantoshHF - opened

I have downloaded onnx model for whisper using optimum-cli export onnx --model openai/whisper-large-v3 whisper_onnx
This model has input [ batch_size, 128, 3000 ]. But the processor is giving [ batch_size, 80, 3000 ]

from transformers import AutoProcessor
from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
from datasets import load_dataset

processor = AutoProcessor.from_pretrained("optimum/whisper-tiny.en")
#import pdb; pdb.set_trace()
model = ORTModelForSpeechSeq2Seq.from_pretrained("optimum/whisper-tiny.en")

ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
inputs = processor.feature_extractor(ds[0]["audio"]["array"], return_tensors="pt")

is there a way to pad or generate sequence length of 128

Sign up or log in to comment