Model has no tokenizer included in the dowload file
#2
by
Bourhano
- opened
Where can we find he tokenizer for this version of CamemBERT, and do all CamemBERT models proposed by this account 'camembert' use the same tokenizer? Since I already have a version of the tokenizer.json but do not recall where I got it from.
Edit:
It seems that the tockenizer differs between 'camembert-base' and 'camembert-large' according to the paper that introduces CamemBERT.
It mentions:
'The second and the third models, camembert-base and camembert-large, respectively, are based on the RoBERTa architecture (Liu et al., 2019), a BERT-based model with some changes (tokenizer, training task, optimization, etc.)'