quintic's picture
Update README.md
b3451cf verified
|
raw
history blame
279 Bytes
---
license: mit
---
Data: c4 and codeparrot, about 1:1 sample-wise but 1:4 token-wise mix. Significantly biased for codes (python, go, java, javascript, c, c++).
Params:
- batch size 64 * 2048 = 131072 tokens
- lr automatically according to EAI sae codebase
- auxk_alpha 0.03