File size: 617 Bytes
ee7a8a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
---
datasets:
- sade-adrien/redpajama_v2_sample_10M
---
MappingAdapter exact structure available in representation_mapping.py

Mapping "sentence-transformers/stsb-roberta-large"'s hidden representation to "mistralai/Mistral-7B-Instruct-v0.1"'s.

Training:
  * Steps: 114k
  * Gradient accumulation: 2
  * Batch size: 64
  * Warm-up steps: 100
  * Learning Rate: 3e-5 with linear scheduling
  * Eval steps: %8000
  * Training hours: ~98h
  * Eval hours: ~10h
  
  * Gradient updates: 57k
  * Train examples: 7.3M
  * Eval examples: 106k
  * Adapter: Decoder_dim (4096) → 4096 → LeakyRelu(.1) → Encoder_dim (1024)