license: mit | |
Data: c4 and codeparrot, about 1:1 sample-wise but 1:4 token-wise mix. Significantly biased for codes (python, go, java, javascript, c, c++). | |
Params: | |
- batch size 64 * 2048 = 131072 tokens | |
- lr automatically according to EAI sae codebase | |
- auxk_alpha 0.03 | |