Vocab Size Mismatch
Thanks for sharing this model!
I have a question, if the vocab size is of 33, why is the logits output shape last dimension 64?
I probably missing something simple.
Thanks uprfont!
Hi @manu261297 ,
The vocab size is actually 64, ESMC (and ESM++) use a different tokenizer than ESM2. Most of these tokens aren't used in practice, but matrices that are divisible by 8 play nicely with GPUs :)
Let me know if you have any other questions.
Best,
- Logan
Great, makes sense. So if I want to get the probability of different AAs for a given position, I just use these IDs from the first 33 that actually correspond to the used vocab, right?
Thanks for your work and your help!
Yep! Indexing / slicing the logits is the best way. You can even slice before softmax if you like, have had some people prefer that for these types of models with unused tokens. Going to close the issue, if you have any other questions feel free to reopen. Take care!