charlieCs commited on
Commit
d0d13e3
·
verified ·
1 Parent(s): c9e87c7

Ensuring compatibility with the sentence-transformers library

Browse files

I encountered the following error when attempting to load multilingual-e5-large-instruct using the sentence-transformers library:
`not found: multilingual-e5-large-instruct/sentence_xlnet_config.json.`

Upon investigating, I noticed that the sentence-transformers library looks for additional model configuration files like `sentence_bert_config.json` during model loading, as shown in this code snippet:
(https://github.com/UKPLab/sentence-transformers/blob/c68bf68299a4435c6a48ea15d789fef596bf1444/sentence_transformers/models/Transformer.py#L527-L540)

Additionally, other embedding models, such as bge-m3, also include this configuration file: https://huggingface.co/BAAI/bge-m3/blob/main/sentence_bert_config.json

To address this issue, I created the necessary `sentence_bert_config.json` file based on the xlm-roberta configuration.

Files changed (1) hide show
  1. sentence_xlm-roberta_config.json +4 -0
sentence_xlm-roberta_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }