How does this model deal with OOV (out-of-vocabulary) words?

#6
by HeitorC - opened

I've been reading and couldn't find data on this specific topic. Either this model or the bge-reranker can detect subword tokens? How do they deal with completely unseen or made-up words?

Beijing Academy of Artificial Intelligence org

Hi, bge embedding and bge reranker both encode the text into a tokens sequence. They will split the unseen words into servel tokens.

Sign up or log in to comment