Allow single quotes "'" and hyphens "-"

#2
by sanchit-gandhi HF staff - opened

Remove single quotes ' (id 6) and hyphens - (id 12) from suppress_tokens. These tokens should not be suppressed during generation. They are accepted as valid generated tokens in the official Whisper repo:
https://github.com/openai/whisper/blob/eff383b27b783e280c089475852ba83f20f64998/whisper/tokenizer.py#L258

Check that we're removing the right tokens:

from transformers import WhisperTokenizer

tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-medium.en")

print(tokenizer.decode(6))
print(tokenizer.decode(12))

Print Output:
```
'


ArthurZ changed pull request status to merged

Sign up or log in to comment