sade-adrien's picture
Update README.md
9b8298b
|
raw
history blame
No virus
617 Bytes
metadata
datasets:
  - sade-adrien/redpajama_v2_sample_10M

MappingAdapter exact structure available in representation_mapping.py

Mapping "sentence-transformers/stsb-roberta-large"'s hidden representation to "mistralai/Mistral-7B-Instruct-v0.1"'s.

Training:

  • Steps: 114k

  • Gradient accumulation: 2

  • Batch size: 64

  • Warm-up steps: 100

  • Learning Rate: 3e-5 with linear scheduling

  • Eval steps: %8000

  • Training hours: ~98h

  • Eval hours: ~10h

  • Gradient updates: 57k

  • Train examples: 7.3M

  • Eval examples: 106k

  • Adapter: Decoder_dim (4096) → 4096 → LeakyRelu(.1) → Encoder_dim (1024)