Jinkin commited on
Commit
997d181
1 Parent(s): 7ebdb3f

update reference

Browse files
Files changed (1) hide show
  1. README.md +16 -0
README.md CHANGED
@@ -1137,6 +1137,22 @@ some useful tricks:
1137
  2. Dataset sampler, we use M3E's dataset sampler to ensure that the samples in each batch come from a dataset, and negative samples are more valuable.
1138
  3. instruction. Instruction has greatly improved the performance of the retrieval task in our experiments. We added instructions like 'query: ' and 'result: ' before each training sample.
1139
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1140
 
1141
  ## License
1142
  Piccolo 使用 MIT License,免费商用。
 
1137
  2. Dataset sampler, we use M3E's dataset sampler to ensure that the samples in each batch come from a dataset, and negative samples are more valuable.
1138
  3. instruction. Instruction has greatly improved the performance of the retrieval task in our experiments. We added instructions like 'query: ' and 'result: ' before each training sample.
1139
 
1140
+ ## Reference
1141
+
1142
+ 这里我们列出了我们参考过的embedding项目和论文
1143
+ 1. [M3E](https://github.com/wangyuxinwhy/uniem)。非常棒的中文开源embedding项目,收集和整理了较多的中文高质量数据集,uniem也是一个不错的框架。
1144
+ 2. [Text2vec](https://github.com/shibing624/text2vec)。另一个一个非常棒的中文开源embedding项目。
1145
+ 3. [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding)。智源AI开源的embedding模型,收集和整理了CMTEB benchmark,填补了中文embedding系统性评测的空缺。
1146
+ 4. [E5](https://github.com/microsoft/unilm/tree/master/e5)。来自微软的一篇文章,有非常详细的消融实验以及数据处理过滤细节。
1147
+ 5. [GTE](https://arxiv.org/abs/2308.03281)。一篇来自阿里达摩的embedding论文。
1148
+
1149
+ Here we list the embedding projects and papers we have referenced
1150
+ 1. [M3E](https://github.com/wangyuxinwhy/uniem). A great Chinese open source embedding project that collects and organizes a large number of high-quality Chinese datasets. Uniem is also a good framework.
1151
+ 2. [Text2vec](https://github.com/shibing624/text2vec). Another great Chinese open source embedding project.
1152
+ 3. [Flag Embedding](https://github.com/FlagOpen/FlagEmbedding). Zhiyuan AI’s open source embedding model.They collect and organize CMTEB benchmark, filling the gap in systematic evaluation of Chinese embeddings.
1153
+ 4. [E5](https://github.com/microsoft/unilm/tree/master/e5). Powerd by microsoft,producing very detailed ablation experiments and data processing filtering details.
1154
+ 5. [GTE](https://arxiv.org/abs/2308.03281). An embedding paper from Alibaba Damo.
1155
+
1156
 
1157
  ## License
1158
  Piccolo 使用 MIT License,免费商用。