Allow single quotes "'" and hyphens "-"
#2
by
sanchit-gandhi
HF staff
- opened
Remove single quotes '
(id 6) and hyphens -
(id 12) from suppress_tokens
. These tokens should not be suppressed during generation. They are accepted as valid generated tokens in the official Whisper repo:
https://github.com/openai/whisper/blob/eff383b27b783e280c089475852ba83f20f64998/whisper/tokenizer.py#L258
Check that we're removing the right tokens:
from transformers import WhisperTokenizer
tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-medium.en")
print(tokenizer.decode(6))
print(tokenizer.decode(12))
Print Output:
```
'
ArthurZ
changed pull request status to
merged