pooling method and results

#2
by basilevc - opened

First of all, thanks for the nice work and sharing your model πŸ€—

I have a few questions regarding the family of models "industry bert" your team has provided here:

  • what kind of pooling method do we use to obtain embeddings from your models? I saw here that you use the CLS (i.e first embedding vector for HF) but would like to be sure
  • do you normalize your embeddings ?
  • what kind of distance metric is to be used? L2 / dot ?
  • have you validated your models on some kind of industry retrieval benchmark ? if so would you be comfortable sharing it ?

also, just FYI (and you probably already know this) to make your model easily loadable via the Sentence BERT framework, you could attach a configuration file such as this model (you obtain these files when you serialize the model via the Sentence Bert Model class): it would make your model seamlessly usable.

Sign up or log in to comment