sensenova
/

piccolo-base-zh

Feature Extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

Jinkin commited on Sep 7, 2023

Commit

7ebdb3f

•

1 Parent(s): ff2a5ac

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -1127,10 +1127,12 @@ The finetune loss uses triple contrastive loss, adding hard negative. Neg num is
 Note: We set different max lengths for query and passage, and the max length of query is always kept at 64.
 ### Others
 1. 减小显存的方式: fp16 + gradient checkpointing + ZERO STAGE1 (stage2 不支持双塔结构下的gradient checkpointing) 相关issue见: https://github.com/microsoft/DeepSpeed/issues/988
 2. dataset sampler，我们采用了M3E的dataset sampler，用以保证每个batch里的样本均来自于一个dataset，负样本更有价值。
 3. instruction。instruction在我们的实验中对retrieval任务有非常大的性能提升，我们在每个训练样本前都加入'查询: '和'结果: '这样的instruction。
 1. The way to reduce memory usage: fp16 + gradient checkpointing + ZERO STAGE1 (stage2 does not support gradient checkpointing under the double-tower structure) For related issues, see: https://github.com/microsoft/DeepSpeed/issues/ 988
 2. Dataset sampler, we use M3E's dataset sampler to ensure that the samples in each batch come from a dataset, and negative samples are more valuable.
 3. instruction. Instruction has greatly improved the performance of the retrieval task in our experiments. We added instructions like 'query: ' and 'result: ' before each training sample.

 Note: We set different max lengths for query and passage, and the max length of query is always kept at 64.
 ### Others
+一些有用的trick:
 1. 减小显存的方式: fp16 + gradient checkpointing + ZERO STAGE1 (stage2 不支持双塔结构下的gradient checkpointing) 相关issue见: https://github.com/microsoft/DeepSpeed/issues/988
 2. dataset sampler，我们采用了M3E的dataset sampler，用以保证每个batch里的样本均来自于一个dataset，负样本更有价值。
 3. instruction。instruction在我们的实验中对retrieval任务有非常大的性能提升，我们在每个训练样本前都加入'查询: '和'结果: '这样的instruction。
+some useful tricks:
 1. The way to reduce memory usage: fp16 + gradient checkpointing + ZERO STAGE1 (stage2 does not support gradient checkpointing under the double-tower structure) For related issues, see: https://github.com/microsoft/DeepSpeed/issues/ 988
 2. Dataset sampler, we use M3E's dataset sampler to ensure that the samples in each batch come from a dataset, and negative samples are more valuable.
 3. instruction. Instruction has greatly improved the performance of the retrieval task in our experiments. We added instructions like 'query: ' and 'result: ' before each training sample.