Jinkin commited on
Commit
7ebdb3f
1 Parent(s): ff2a5ac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -1127,10 +1127,12 @@ The finetune loss uses triple contrastive loss, adding hard negative. Neg num is
1127
  Note: We set different max lengths for query and passage, and the max length of query is always kept at 64.
1128
 
1129
  ### Others
 
1130
  1. 减小显存的方式: fp16 + gradient checkpointing + ZERO STAGE1 (stage2 不支持双塔结构下的gradient checkpointing) 相关issue见: https://github.com/microsoft/DeepSpeed/issues/988
1131
  2. dataset sampler,我们采用了M3E的dataset sampler,用以保证每个batch里的样本均来自于一个dataset,负样本更有价值。
1132
  3. instruction。instruction在我们的实验中对retrieval任务有非常大的性能提升,我们在每个训练样本前都加入'查询: '和'结果: '这样的instruction。
1133
 
 
1134
  1. The way to reduce memory usage: fp16 + gradient checkpointing + ZERO STAGE1 (stage2 does not support gradient checkpointing under the double-tower structure) For related issues, see: https://github.com/microsoft/DeepSpeed/issues/ 988
1135
  2. Dataset sampler, we use M3E's dataset sampler to ensure that the samples in each batch come from a dataset, and negative samples are more valuable.
1136
  3. instruction. Instruction has greatly improved the performance of the retrieval task in our experiments. We added instructions like 'query: ' and 'result: ' before each training sample.
 
1127
  Note: We set different max lengths for query and passage, and the max length of query is always kept at 64.
1128
 
1129
  ### Others
1130
+ 一些有用的trick:
1131
  1. 减小显存的方式: fp16 + gradient checkpointing + ZERO STAGE1 (stage2 不支持双塔结构下的gradient checkpointing) 相关issue见: https://github.com/microsoft/DeepSpeed/issues/988
1132
  2. dataset sampler,我们采用了M3E的dataset sampler,用以保证每个batch里的样本均来自于一个dataset,负样本更有价值。
1133
  3. instruction。instruction在我们的实验中对retrieval任务有非常大的性能提升,我们在每个训练样本前都加入'查询: '和'结果: '这样的instruction。
1134
 
1135
+ some useful tricks:
1136
  1. The way to reduce memory usage: fp16 + gradient checkpointing + ZERO STAGE1 (stage2 does not support gradient checkpointing under the double-tower structure) For related issues, see: https://github.com/microsoft/DeepSpeed/issues/ 988
1137
  2. Dataset sampler, we use M3E's dataset sampler to ensure that the samples in each batch come from a dataset, and negative samples are more valuable.
1138
  3. instruction. Instruction has greatly improved the performance of the retrieval task in our experiments. We added instructions like 'query: ' and 'result: ' before each training sample.