sade-adrien's picture
Update README.md
9b8298b
|
raw
history blame
No virus
617 Bytes
---
datasets:
- sade-adrien/redpajama_v2_sample_10M
---
MappingAdapter exact structure available in representation_mapping.py
Mapping "sentence-transformers/stsb-roberta-large"'s hidden representation to "mistralai/Mistral-7B-Instruct-v0.1"'s.
Training:
* Steps: 114k
* Gradient accumulation: 2
* Batch size: 64
* Warm-up steps: 100
* Learning Rate: 3e-5 with linear scheduling
* Eval steps: %8000
* Training hours: ~98h
* Eval hours: ~10h
* Gradient updates: 57k
* Train examples: 7.3M
* Eval examples: 106k
* Adapter: Decoder_dim (4096) β†’ 4096 β†’ LeakyRelu(.1) β†’ Encoder_dim (1024)