[CLS] token representation or Pooled tokens?

#8
by aarabil - opened

How is the base model used during finetuning, do you use the [CLS] hidden token representation or do you pool the tokens together somehow (e.g. averaging)?

Sign up or log in to comment