sanchit-gandhi HF staff commited on
Commit
383378f
1 Parent(s): e7c2018

Allow single quotes "'" and hyphens "-"

Browse files

Remove single quotes `'` (id 6) and hyphens `-` (id 12) from `suppress_tokens`. These tokens should **not** be suppressed during generation. They are accepted as valid generated tokens in the official Whisper repo:
https://github.com/openai/whisper/blob/eff383b27b783e280c089475852ba83f20f64998/whisper/tokenizer.py#L258

Check that we're removing the right tokens:
```python
from transformers import WhisperTokenizer

tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-large")

print(tokenizer.decode(6))
print(tokenizer.decode(12))
```

**Print Output:**
```
'
-
```

Files changed (1) hide show
  1. config.json +0 -2
config.json CHANGED
@@ -46,12 +46,10 @@
46
  "suppress_tokens": [
47
  1,
48
  2,
49
- 6,
50
  7,
51
  8,
52
  9,
53
  10,
54
- 12,
55
  14,
56
  25,
57
  26,
 
46
  "suppress_tokens": [
47
  1,
48
  2,
 
49
  7,
50
  8,
51
  9,
52
  10,
 
53
  14,
54
  25,
55
  26,