crow-1b-attempt1 / README.md
euclaise's picture
Update README.md
f6905cf
|
raw
history blame
806 Bytes
metadata
license: apache-2.0
datasets:
  - euclaise/SuperMC
  - euclaise/prm800k_preferences

Expirements in large-scale small-scale preference learning.

falcon-rw-1b trained with PRO (preference ranking optimization, see https://arxiv.org/abs/2306.17492) on SuperMC and PRM800K for 3 epochs, using my supertrainer2000 framework.

This is an expiremental model.

Benchmarks coming soon.

Hyperparameters:

  • AdamW, weight decay of 0.01, otherwise default hyperparams
  • Maximum LR of 1e-5
  • Cosine schedule with a warmup of 5400 steps
  • Batch size of 4 (2 real x 2 accumulated)
  • Maximum of 5 epochs, early stopping (visual observation), stopped after 3
  • Gradient clipping norm value of 1.0
  • PRO beta of 4

Training prompt format:

### Query
[insert instruction here]

### Answer
[insert response here]