How can I get large corpus dataset (over 200 Millions of records) in a tsv file format to encode with intfloat/e5-large-v2 as an embedding model ?
#15 opened 2 months ago
by
liorf95
Comparison with multilingual-e5-large
#14 opened 5 months ago
by
xuuxu
Single input vs Multiple inputs
1
#13 opened 8 months ago
by
innovationTony
Possible Vector Collaps Issue
1
#10 opened 12 months ago
by
Banso
Changing the dimensions of the embeddings
1
#9 opened 12 months ago
by
Suijhin
Adding ONNX file of this model
#5 opened about 1 year ago
by
asifanchor
Adding `safetensors` variant of this model
#4 opened about 1 year ago
by
SFconvertbot
![](https://cdn-avatars.huggingface.co/v1/production/uploads/635fd4cc14657fb8cff2a081/GDkyDwAcuqDBpaOvQgJuq.png)
e5-large-v2 requirements for training in non english?
2
#3 opened about 1 year ago
by
wilfoderek
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60bfa4237f75bb4d92557db9/8Vu3xJkqI59GrtoFrZbwj.jpeg)
Which embedding vector to use?
8
#2 opened about 1 year ago
by
moooji
How can I support the max_length=2048
6
#1 opened about 1 year ago
by
nlpdev3